← Blog/Benchmark

We Benchmarked 7 Load Testing Tools — Here Are the Results

12 min read

Which load testing tool gives you the most requests per second? The lowest latency? The least memory usage? We ran identical tests across seven tools — including both full frameworks and raw HTTP benchmarkers — to find out. Every tool hit the same Rust/Axum server, the same endpoint, for the same duration, on the same machine.

Headlines

Among full load testing frameworks (with scripting, assertions, and multi-protocol support):

Fastest Framework
145,213 RPS

Fusillade at 1K workers — 1.2x faster than k6, the next-closest framework.

Lowest Latency
0.076ms P50

60x lower than k6. Sub-0.1ms means your measurements are the server, not the tool.

Most Efficient
692 RPS/MB

9.1x more efficient than k6 per MB of RAM. 210 MB total for 1K workers.

Raw HTTP benchmarkers (bombardier, oha) achieved higher raw throughput but lack scripting and assertions. See full rankings below.

Full Rankings

All seven tools ran the same test: 1,000 concurrent connections hitting GET /api/users on a local Rust/Axum server for 60 seconds. Memory was measured via /usr/bin/time -v.

#ToolRPSP50P99RAMRPS/MB
1bombardier564,9441.50ms5.95ms39 MB14,487
2oha523,2841.58ms6.07ms6,634 MB79
3Fusillade145,2130.08ms0.35ms210 MB692
4k6121,9724.55ms22.50ms1,608 MB76
5Gatling16,60629.00ms477ms659 MB25
6Locust2,37422.00ms40ms116 MB20
7Artillery*1,000<1ms3ms2,762 MB0.4

* Artillery uses an arrival-rate model (1,000 req/s by design), not concurrent-user saturation. bombardier and oha are raw HTTP benchmarkers with no scripting engine. All tools achieved a 0.00% error rate across the full test duration.

Among Full Frameworks

bombardier and oha are raw HTTP benchmarkers — fast, but no scripting, no assertions, no multi-step scenarios. When you compare only the tools that offer scripting, assertions, and multi-protocol support, the picture changes.

#ToolRPSP50RAMRPS/MB
1Fusillade145,2130.08ms210 MB692
2k6121,9724.55ms1,608 MB76
3Gatling16,60629.00ms659 MB25
4Locust2,37422.00ms116 MB20
5Artillery*1,000<1ms2,762 MB0.4

Fusillade leads in throughput, latency, and efficiency. It delivers the most requests per second while using the least memory of any full framework tested.

Fusillade vs k6: Head to Head

Fusillade and k6 are the two closest competitors in throughput. Both ran with 1,000 virtual users, response body discarding, and status code assertions on every request.

Fusillade — 145,213 RPS210 MB
k6 — 121,972 RPS1,608 MB
MetricFusilladek6Difference
Requests/sec145,213121,9721.2x
Total Requests8,712,8057,318,890+19%
P50 Latency0.076ms4.55ms60x lower
P95 Latency0.156ms13.03ms84x lower
P99 Latency0.345ms22.50ms65x lower
Peak Memory210 MB1,608 MB7.7x less
Efficiency692 RPS/MB76 RPS/MB9.1x

For a deeper comparison covering architecture, scripting, and cloud execution, see our Fusillade vs k6 comparison.

Tool-by-Tool Notes

bombardier — 564,944 RPS

The raw throughput champion. bombardier is a Go-based HTTP benchmarker that does one thing extremely well: saturate connections. At just 39 MB of RAM, it achieved the highest RPS and the best efficiency of any tool. The tradeoff: no scripting, no assertions, no scenarios. It is a benchmarking tool, not a load testing framework.

oha — 523,284 RPS

Another raw benchmarker, written in Rust. Impressive throughput, but used 6.6 GB of RAM — more than any other tool. The high memory is likely due to oha buffering full response bodies in memory for latency histogram computation (it has no discard-body flag). Like bombardier, it is purpose-built for HTTP saturation testing and lacks scripting or assertion capabilities.

Gatling — 16,606 RPS

The JVM-based framework delivered solid throughput for a full framework, but at higher latency (29ms P50) and memory cost (659 MB). Gatling excels at complex scenario modeling with its Scala/Java DSL and built-in HTML reports. CPU usage was the highest of any tool at 1,266%.

Locust — 2,374 RPS

Python's GIL limits Locust to a single CPU core (98% usage), constraining throughput. It hit a CPU ceiling warning during the test. For higher throughput, Locust recommends distributed mode across multiple processes. Memory efficiency was decent at 116 MB.

Artillery — 1,000 RPS

Artillery uses an arrival-rate model rather than sustained concurrency. It sent exactly 1,000 requests per second (60,000 total) as configured. Response times were sub-millisecond because actual concurrency stayed low. The 2.7 GB memory footprint for Node.js was notable. Artillery is better suited for realistic user-arrival simulation than raw throughput testing.

Why Latency Matters More Than Throughput

A load testing tool's own latency directly affects measurement accuracy. If your API responds in 2ms but your tool adds 4.55ms of overhead (k6) or 29ms (Gatling), you cannot tell whether an optimization improved your endpoint from 2ms to 1ms.

Fusillade's 0.076ms median latency means the tool itself is invisible in the measurement. The numbers you see are your server's actual performance, not an artifact of the testing tool. This is possible because Fusillade executes HTTP requests directly through Rust's hyper async runtime with zero-copy buffer management — the JavaScript scripting engine handles test logic, but the hot path (connection, send, receive) is pure native code with no GC pauses or language bridge overhead.

Median latency overhead per tool
Fusillade
0.08ms
bombardier
1.50ms
oha
1.58ms
k6
4.55ms
Locust
22.00ms
Gatling
29.00ms

Memory: The Hidden Cost

Memory usage determines where you can run load tests. A tool that needs 1.6 GB for 1K users (k6) requires a t3.large or bigger. Fusillade at 210 MB runs on a t3.small. At 10K users, the gap widens further and some tools trigger OOM kills.

ToolPeak RAM (1K workers)
bombardier39 MB
Locust116 MB
Fusillade210 MB
Gatling659 MB
k61,608 MB
Artillery2,762 MB
oha6,634 MB

Methodology

Transparency matters for benchmarks. Here is exactly how we ran these tests so you can reproduce them.

  • Target: Rust/Axum HTTP server serving a static 800-byte JSON response
  • Concurrency: 1,000 virtual users for all tools
  • Duration: 60 seconds per tool with 10-second cooldowns
  • Assertions: Fusillade and k6 validated status === 200 on every response
  • Memory: Measured via /usr/bin/time -v (maximum resident set size)
  • Execution: Sequential runs, same machine, same server process
  • Connection pooling: All tools used their default pooling
  • Error rate: 0.00% across all tools — every request returned HTTP 200
ToolVersionRuntime
Fusillade1.5.1Rust (native)
k61.6.1Go
bombardier1.2.6Go
oha1.13.0Rust
Gatling3.14.3JVM (Java 21)
Locust2.43.2Python 3.14
Artillery2.0.30Node.js 25.6

System: AMD Ryzen 7 5800X (8 cores / 16 threads), 32 GB DDR4, Arch Linux kernel 6.18.9.

Run your own benchmarks

Install Fusillade and see the numbers on your own infrastructure. 100 minutes of cloud testing per month, free.