We Benchmarked 7 Load Testing Tools — Here Are the Results

February 22, 202612 min read

Which load testing tool gives you the most requests per second? The lowest latency? The least memory usage? We ran identical tests across seven tools — including both full frameworks and raw HTTP benchmarkers — to find out. Every tool hit the same Rust/Axum server, the same endpoint, for the same duration, on the same machine.

Headlines

Among full load testing frameworks (with scripting, assertions, and multi-protocol support):

Fastest Framework

145,213 RPS

Fusillade at 1K workers — 1.2x faster than k6, the next-closest framework.

Lowest Latency

0.076ms P50

60x lower than k6. Sub-0.1ms means your measurements are the server, not the tool.

Most Efficient

692 RPS/MB

9.1x more efficient than k6 per MB of RAM. 210 MB total for 1K workers.

Raw HTTP benchmarkers (bombardier, oha) achieved higher raw throughput but lack scripting and assertions. See full rankings below.

Full Rankings

All seven tools ran the same test: 1,000 concurrent connections hitting GET /api/users on a local Rust/Axum server for 60 seconds. Memory was measured via /usr/bin/time -v.

#	Tool	RPS	P50	P99	RAM	RPS/MB
1	bombardier	564,944	1.50ms	5.95ms	39 MB	14,487
2	oha	523,284	1.58ms	6.07ms	6,634 MB	79
3	Fusillade	145,213	0.08ms	0.35ms	210 MB	692
4	k6	121,972	4.55ms	22.50ms	1,608 MB	76
5	Gatling	16,606	29.00ms	477ms	659 MB	25
6	Locust	2,374	22.00ms	40ms	116 MB	20
7	Artillery*	1,000	<1ms	3ms	2,762 MB	0.4

* Artillery uses an arrival-rate model (1,000 req/s by design), not concurrent-user saturation. bombardier and oha are raw HTTP benchmarkers with no scripting engine. All tools achieved a 0.00% error rate across the full test duration.

Among Full Frameworks

bombardier and oha are raw HTTP benchmarkers — fast, but no scripting, no assertions, no multi-step scenarios. When you compare only the tools that offer scripting, assertions, and multi-protocol support, the picture changes.

#	Tool	RPS	P50	RAM	RPS/MB
1	Fusillade	145,213	0.08ms	210 MB	692
2	k6	121,972	4.55ms	1,608 MB	76
3	Gatling	16,606	29.00ms	659 MB	25
4	Locust	2,374	22.00ms	116 MB	20
5	Artillery*	1,000	<1ms	2,762 MB	0.4

Fusillade leads in throughput, latency, and efficiency. It delivers the most requests per second while using the least memory of any full framework tested.

Fusillade vs k6: Head to Head

Fusillade and k6 are the two closest competitors in throughput. Both ran with 1,000 virtual users, response body discarding, and status code assertions on every request.

Fusillade — 145,213 RPS210 MB

k6 — 121,972 RPS1,608 MB

Metric	Fusillade	k6	Difference
Requests/sec	145,213	121,972	1.2x
Total Requests	8,712,805	7,318,890	+19%
P50 Latency	0.076ms	4.55ms	60x lower
P95 Latency	0.156ms	13.03ms	84x lower
P99 Latency	0.345ms	22.50ms	65x lower
Peak Memory	210 MB	1,608 MB	7.7x less
Efficiency	692 RPS/MB	76 RPS/MB	9.1x

For a deeper comparison covering architecture, scripting, and cloud execution, see our Fusillade vs k6 comparison.

Tool-by-Tool Notes

bombardier — 564,944 RPS

The raw throughput champion. bombardier is a Go-based HTTP benchmarker that does one thing extremely well: saturate connections. At just 39 MB of RAM, it achieved the highest RPS and the best efficiency of any tool. The tradeoff: no scripting, no assertions, no scenarios. It is a benchmarking tool, not a load testing framework.

oha — 523,284 RPS

Another raw benchmarker, written in Rust. Impressive throughput, but used 6.6 GB of RAM — more than any other tool. The high memory is likely due to oha buffering full response bodies in memory for latency histogram computation (it has no discard-body flag). Like bombardier, it is purpose-built for HTTP saturation testing and lacks scripting or assertion capabilities.

Gatling — 16,606 RPS

The JVM-based framework delivered solid throughput for a full framework, but at higher latency (29ms P50) and memory cost (659 MB). Gatling excels at complex scenario modeling with its Scala/Java DSL and built-in HTML reports. CPU usage was the highest of any tool at 1,266%.

Locust — 2,374 RPS

Python's GIL limits Locust to a single CPU core (98% usage), constraining throughput. It hit a CPU ceiling warning during the test. For higher throughput, Locust recommends distributed mode across multiple processes. Memory efficiency was decent at 116 MB.

Artillery — 1,000 RPS

Artillery uses an arrival-rate model rather than sustained concurrency. It sent exactly 1,000 requests per second (60,000 total) as configured. Response times were sub-millisecond because actual concurrency stayed low. The 2.7 GB memory footprint for Node.js was notable. Artillery is better suited for realistic user-arrival simulation than raw throughput testing.

Why Latency Matters More Than Throughput

A load testing tool's own latency directly affects measurement accuracy. If your API responds in 2ms but your tool adds 4.55ms of overhead (k6) or 29ms (Gatling), you cannot tell whether an optimization improved your endpoint from 2ms to 1ms.

Fusillade's 0.076ms median latency means the tool itself is invisible in the measurement. The numbers you see are your server's actual performance, not an artifact of the testing tool. This is possible because Fusillade executes HTTP requests directly through Rust's hyperasync runtime with zero-copy buffer management — the JavaScript scripting engine handles test logic, but the hot path (connection, send, receive) is pure native code with no GC pauses or language bridge overhead.

Median latency overhead per tool

Fusillade

0.08ms

bombardier

1.50ms

oha

1.58ms

4.55ms

Locust

22.00ms

Gatling

29.00ms

Memory: The Hidden Cost

Memory usage determines where you can run load tests. A tool that needs 1.6 GB for 1K users (k6) requires a t3.large or bigger. Fusillade at 210 MB runs on a t3.small. At 10K users, the gap widens further and some tools trigger OOM kills.

Tool	Peak RAM (1K workers)
bombardier	39 MB
Locust	116 MB
Fusillade	210 MB
Gatling	659 MB
k6	1,608 MB
Artillery	2,762 MB
oha	6,634 MB

Methodology

Transparency matters for benchmarks. Here is exactly how we ran these tests so you can reproduce them.

Target: Rust/Axum HTTP server serving a static 800-byte JSON response
Concurrency: 1,000 virtual users for all tools
Duration: 60 seconds per tool with 10-second cooldowns
Assertions: Fusillade and k6 validated status === 200 on every response
Memory: Measured via /usr/bin/time -v (maximum resident set size)
Execution: Sequential runs, same machine, same server process
Connection pooling: All tools used their default pooling
Error rate: 0.00% across all tools — every request returned HTTP 200

Tool	Version	Runtime
Fusillade	1.5.1	Rust (native)
k6	1.6.1	Go
bombardier	1.2.6	Go
oha	1.13.0	Rust
Gatling	3.14.3	JVM (Java 21)
Locust	2.43.2	Python 3.14
Artillery	2.0.30	Node.js 25.6

System: AMD Ryzen 7 5800X (8 cores / 16 threads), 32 GB DDR4, Arch Linux kernel 6.18.9.

Run your own benchmarks

Install Fusillade and see the numbers on your own infrastructure. 100 minutes of cloud testing per month, free.

Start Free View Pricing