While working through some issues this weekend and preparing another blog post, I finally got around to doing some comprehensive benchmarking on Project Mercury. In the process, I discovered that the first bottleneck I hit running tests from my desktop was the local (last-mile) internet connection, so I switched to running the tests from another EC2 instance. This means that network is not a factor in my results, giving us a real sense of the raw power behind this stack.
For all these tests, I used the Mercury Alpha4 release on a small ec2 instance, loading a staging copy of Mission Bicycles, which is a good "heavy" example in that it has a lot of modules loaded, including Ubercart and Panels. My goal was to measure throughput and response times under various caching configurations, angling for the best results in terms of pages served per second, and delivery time.
I started by cutting things all the way back to nothing, and then added each layer of the caching infrastructure, running benchmarks at each point. The results are quite eye-opening. Can you say 2000+ requests per second? Read on for the full story.
Benchmarks Using ApacheBench on Small EC2 Instances
|Step||Concurrency||Caching||Throughput||95% response time||Observed Load|
|1||2||no caching||0.8 req/sec||6880ms||3.5|
|2||5||page cache||10 req/sec||615ms||4|
|3||10||apc + page cache||70 req/sec||209ms||6|
|4||10||apc + aggressive cache||140 req/sec||148ms||10|
|5||25||apc + cacherouter + aggressive cache||322 req/sec||93ms||12|
1. No Caching
As we can see, hosted on a small server, stripped of all caching assistance and loaded down with a heavy module stack, Drupal doesn't perform very well. Less than 1 request per second in throughput sent the load up to 3.5, and page response times were taking more than 6 seconds almost 100% of the time. Thankfully, you'd only run your site under anything approaching these circumstances in pure development mode.
2. Standard Page Cache
Drupal ships with a standard page cache which uses the database to store a fully-rendered page. It doesn't have any bad side-effects, and everyone is encouraged to use it in production. Implementing this common-sense feature got us up to 10 requests per second, although the load was still high.
3. Adding APC
APC allows PHP to run faster and consume less memory by keeping the code compiled and cached in memory. Adding it into the mix allowed me to ramp up the concurrency on the tests to 10, and we saw 70 requests per second in throughput with most of the requests coming back in under 150ms. The box was basically getting swamped, but still handling a lot of requests quite quickly.
4. Aggressive Caching
Drupal's "Aggressive Caching" system cuts back on the amount of processing that needs to occur before/around a pageload, even one from cache. Enabling this basically doubled throughput to 140, and allowed the server to become even more swamped with requests.
One of the most important things to do when running high-performance Drupal, especially in the cloud, is get your Drupal cache system out of the database. APC (or memcached) can act as a high-speed caching backend for full pages, as well as for application variables, menu trees, views, filter output, and more. This is vital to keeping any large site responsive for logged-in users, and allowing Drupal to serve cached pages out of memory (as opposed to from mysql) sped up responses for this test as well.
With this measure in place, we are doing pretty much everything possible to serve as many cached pages as quickly as possible, and we got upwards of 300 requests per second and very good response times, although the box was so loaded it probably would take 30 seconds or more to deliver a non-cached page or respond to any logged-in user.
6. Mercury (Pressflow and Varnish)
Mercury took throughput up by a factor of about 6, delivered the pages faster, and simultaneously reduced load by almost 100x. This is very important, because it means while a few urls are taking a massive pounding — typical results from a link on Digg, Drudge or Perez Hilton — the rest of the system can continue to function. The the truth is there's just no way Apache, PHP and Drupal can compete with Pressflow and Varnish. It's like the difference between a Cessna and an F-15; they're just different classes of tool.
7. Internet-Scale Traffic
Just to see what would happen, I upped the concurrency to 100 and Mercury spat out 50,000 pages in 20.7 seconds for an overall throughput of 2415 requests/second. The average time went down a bit as I think the network interfaces were getting saturated, but this is what we call "internet-scale" traffic. I've included the statpr0n below:
Server Software: Apache/2.2.11 Server Hostname: ec2-174-129-144-81.compute-1.amazonaws.com Server Port: 80 Document Path: / Document Length: 12972 bytes Concurrency Level: 100 Time taken for tests: 20.699 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 673107416 bytes HTML transferred: 648699608 bytes Requests per second: 2415.55 [#/sec] (mean) Time per request: 41.398 [ms] (mean) Time per request: 0.414 [ms] (mean, across all concurrent requests) Transfer rate: 31756.33 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 17 141.9 6 4270 Processing: 3 23 27.2 13 676 Waiting: 2 12 17.4 7 626 Total: 5 40 143.6 21 4279 Percentage of the requests served within a certain time (ms) 50% 21 66% 42 75% 50 80% 52 90% 76 95% 83 98% 103 99% 116 100% 4279 (longest request)
Of course the standard caveats apply: these are only raw benchmarks and don't indicate specific real-world performance. Your mileage may vary.
However, I think walking the caching stack and going from 0.8 to 2415.55 requests per second on the same hardware shows just how much the right stack matters. Literally, compared to an un-configured Drupal install, Mercury handled 3000x as many requests, with 40x less load. Impressive!