Project Mercury Benchmarks: 2000+ Requests Per Second!
While working through some issues this weekend and preparing another blog post, I finally got around to doing some comprehensive benchmarking on Project Mercury. In the process, I discovered that the first bottleneck I hit running tests from my desktop was the local (last-mile) internet connection, so I switched to running the tests from another EC2 instance. This means that network is not a factor in my results, giving us a real sense of the raw power behind this stack.
For all these tests, I used the Mercury Alpha4 release on a small ec2 instance, loading a staging copy of Mission Bicycles, which is a good "heavy" example in that it has a lot of modules loaded, including Ubercart and Panels. My goal was to measure throughput and response times under various caching configurations, angling for the best results in terms of pages served per second, and delivery time.
I started by cutting things all the way back to nothing, and then added each layer of the caching infrastructure, running benchmarks at each point. The results are quite eye-opening. Can you say 2000+ requests per second? Read on for the full story.
Benchmarks Using ApacheBench on Small EC2 Instances
| Step | Concurrency | Caching | Throughput | 95% response time | Observed Load |
|---|---|---|---|---|---|
| 1 | 2 | no caching | 0.8 req/sec | 6880ms | 3.5 |
| 2 | 5 | page cache | 10 req/sec | 615ms | 4 |
| 3 | 10 | apc + page cache | 70 req/sec | 209ms | 6 |
| 4 | 10 | apc + aggressive cache | 140 req/sec | 148ms | 10 |
| 5 | 25 | apc + cacherouter + aggressive cache | 322 req/sec | 93ms | 12 |
| 6 | 25 | mercury | 1869 req/sec | 37ms | 0.1 |
| 7 | 100 | mercury | 2415 req/sec | 83ms | 0.1 |
1. No Caching
As we can see, hosted on a small server, stripped of all caching assistance and loaded down with a heavy module stack, Drupal doesn't perform very well. Less than 1 request per second in throughput sent the load up to 3.5, and page response times were taking more than 6 seconds almost 100% of the time. Thankfully, you'd only run your site under anything approaching these circumstances in pure development mode.
2. Standard Page Cache
Drupal ships with a standard page cache which uses the database to store a fully-rendered page. It doesn't have any bad side-effects, and everyone is encouraged to use it in production. Implementing this common-sense feature got us up to 10 requests per second, although the load was still high.
3. Adding APC
APC allows PHP to run faster and consume less memory by keeping the code compiled and cached in memory. Adding it into the mix allowed me to ramp up the concurrency on the tests to 10, and we saw 70 requests per second in throughput with most of the requests coming back in under 150ms. The box was basically getting swamped, but still handling a lot of requests quite quickly.
4. Aggressive Caching
Drupal's "Aggressive Caching" system cuts back on the amount of processing that needs to occur before/around a pageload, even one from cache. Enabling this basically doubled throughput to 140, and allowed the server to become even more swamped with requests.
5. Cacherouter
One of the most important things to do when running high-performance Drupal, especially in the cloud, is get your Drupal cache system out of the database. APC (or memcached) can act as a high-speed caching backend for full pages, as well as for application variables, menu trees, views, filter output, and more. This is vital to keeping any large site responsive for logged-in users, and allowing Drupal to serve cached pages out of memory (as opposed to from mysql) sped up responses for this test as well.
With this measure in place, we are doing pretty much everything possible to serve as many cached pages as quickly as possible, and we got upwards of 300 requests per second and very good response times, although the box was so loaded it probably would take 30 seconds or more to deliver a non-cached page or respond to any logged-in user.
6. Mercury (Pressflow and Varnish)
Mercury took throughput up by a factor of about 6, delivered the pages faster, and simultaneously reduced load by almost 100x. This is very important, because it means while a few urls are taking a massive pounding — typical results from a link on Digg, Drudge or Perez Hilton — the rest of the system can continue to function. The the truth is there's just no way Apache, PHP and Drupal can compete with Pressflow and Varnish. It's like the difference between a Cessna and an F-15; they're just different classes of tool.
7. Internet-Scale Traffic
Just to see what would happen, I upped the concurrency to 100 and Mercury spat out 50,000 pages in 20.7 seconds for an overall throughput of 2415 requests/second. The average time went down a bit as I think the network interfaces were getting saturated, but this is what we call "internet-scale" traffic. I've included the statpr0n below:
Server Software: Apache/2.2.11 Server Hostname: ec2-174-129-144-81.compute-1.amazonaws.com Server Port: 80
Document Path: / Document Length: 12972 bytes
Concurrency Level: 100 Time taken for tests: 20.699 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 673107416 bytes HTML transferred: 648699608 bytes Requests per second: 2415.55 [#/sec] (mean) Time per request: 41.398 [ms] (mean) Time per request: 0.414 [ms] (mean, across all concurrent requests) Transfer rate: 31756.33 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 2 17 141.9 6 4270 Processing: 3 23 27.2 13 676 Waiting: 2 12 17.4 7 626 Total: 5 40 143.6 21 4279
Percentage of the requests served within a certain time (ms) 50% 21 66% 42 75% 50 80% 52 90% 76 95% 83 98% 103 99% 116 100% 4279 (longest request)
Of course the standard caveats apply: these are only raw benchmarks and don't indicate specific real-world performance. Your mileage may vary.
However, I think walking the caching stack and going from 0.8 to 2415.55 requests per second on the same hardware shows just how much the right stack matters. Literally, compared to an un-configured Drupal install, Mercury handled 3000x as many requests, with 40x less load. Impressive!


Not just Drupal + Varnish
It should be noted that stages six and seven do not merely add Varnish in front of Drupal. Project Mercury switches out Drupal 6 core for Pressflow 6. Pressflow 6 has extensive differences from standard Drupal 6 designed to support Varnish. Drupal 7 will add support similar to Pressflow 6's.
Yes
This is true. You can't just stick Varnish on Drupal6. I will update the post to be more clear about that point! (and to fix some grammar)
http://www.chapterthree.com | http://www.outlandishjosh.com
Looking forward for more
Looking forward for more info about D6 + Varnish :-)
Benchmarks really only test varnish
I think it is worth noting that much of the other performance tweaks and caching layers in Mercury are made pretty redundant in this particular benchmark (although of course would all add value in most real site patterns), because Pressflow+Varnish allows pages to be served completely from the Varnish cache (i.e. RAM). Hence, because there are no authenticated users (or cache invalidation) with the ab benchmarks neither Apache nor Drupal is involved at all beyond the first request - i.e. a similarly sized page served by Plone, Wordpress or static HTML should get near identical results.
Indeed
This is a valid point, but what makes these results interesting is precisely the origin point. Drupal's power as a CMS lets people run sites that just aren't possible in WP, or could require too much specialized knowledge to build in Plone, and would be nightmarish to maintain as static files. With Pressflow, you can use Drupal with Varnish, and that's a big win.
Also, the other enhancements on the stack are still germane: they help us insure that during this kind of barrage, the rest of the site remains functional/snappy.
It's true that the top-line number here is a raw power benchmark of Varnish, but knowing that these are the numbers you can hit, and that with server load staying low you can still generate other unique pages on a good timetable is pretty important IMHO.
This was also my original
This was also my original reaction, to be honest. The benchmarks really only test Varnish, and no longer test Drupal, PHP, MySQL or Apache. That is still GREAT because in some cases, one's website can be cached extensively with Varnish. In many cases it simply can't and in that case we're back to serving slow non-cached pages at 1 second/page (or whatnot). In those cases, Varnish still helps, but if you do the math, non-cached pages will be what you are concerned about. Here, the "real" solution is probably horizontal scaling with memcached (or vertical scaling) and Varnish will start to matter less compared to the ideal scenario used in these benchmarks. Nothing new or shocking here, but important to keep in mind when looking at the data. At the end of the day, all performance improvements help, and clearly, this is a significant one.
For authenticated user
For authenticated user caching, I have been doing some D7 work on a drupal_render() cache . Am blocked by a couple of thorny DBTNG issues.
An eventual, long-term goal
An eventual, long-term goal ought to be serving almost all pages from edge caches -- even to authenticated users -- and simply altering or including the user-variant content as necessary. There's nothing sacred about serving only anonymous users from the edge cache; it's just hard to do otherwise.
Nice writeup. The narrative
Nice writeup. The narrative was especially helpful.
Comments
Josh
Note that aggressive caching is now removed from D7. So perhaps taking out aggressive caching would be appropriate for future benchmarks.
David
Is the Pressflow cache header patch based on this 2bits article, or on a backport of D7's changes to cache headers?
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
Khalid, Aggressive page
Khalid,
Aggressive page caching has not been removed from core 7, it was merely cleaned-up and renamed. The ability not to call hook_boot() and hook_exit() is still there ($conf['page_cache_invoke_hooks'] = FALSE), and necessary for Varnish to be able to cache pages for anonymous users.
Damien Tournoud
http://drupalfr.org
Pressflow uses the same
Pressflow uses the same system as Drupal 7's lazy session creation and (cache) header management. This is a big departure from Drupal 6's code, not a minor patch. There were several cycles of updating the D7 patch and updating PF5/6. We are currently running Pressflow's session and header code on Drupal.org. We're waiting on a better configuration for Squid or moving to Varnish before we take full advantage of it, though.
Well done!
My private obsession on making Drupal sites running fast was greatly inspired by many performance related patches for 5.x and by many great 2bits.com articles, and now the credit goes also to people behind Pressflow and Mercury project. Just a day before Pressflow were opened to download by anyone, we wrote to David Strauss we are testing and will post results of using Ncache on the front and Nginx as a backend. We are collecting now some real world usage examples to get a good base for a benchmark similar to posted here by Josh. Thanks guys!
~Grace -- Turnkey Drupal Hosting on Steroids -- http://omega8.cc
Would love to see us do some
Would love to see us do some tests with Drupal 7 to identify what patches we need (and can) move from Pressflow to Drupal 7. We already did a lot of work to make Varnish work out of the box with Drupal 7 but there is more we can do ... we have to make sure that Drupal 7 scales, and that it is at least as fast as Drupal 6.
I don't believe Drupal 7's
I don't believe Drupal 7's Varnish support is in any way less polished than Pressflow's. If anything, there are outstanding Drupal 7 changes that still need backporting to Pressflow.
Catch's Patch
When it comes to running high-performance sites, I think the single most important patch out there is this one. Adding reverse proxy (and CDN) can extend performance for cached/cacheable pages, but those pages still need to be generated by Drupal. Allowing for a non-datbase object-level cache will speed that up quite a bit, as well as making it possible to build much faster sites for logged-in users.
There's a bit of a tension between how good Drupal does as a simple "out of the box" tool deployed on low-end shared hosting, and how many tools there are in its system for running in high performance environments, which require root access on at least one server, and more intimate knowledge of other parts of the stack.
I tend to agree with Catch's sentiment here:
I think as we move more good features into core, it's only natural that it will be a bit slower, at least at first. Drupal 7's core is like Drupal 6 + CCK, so there's a non-trivial difference in complexity right there. Perhaps beyond the code freeze, we can keep an open mind about performance-enhancing patches that don't alter any APIs?
non-cached pages on pressflow too?
It's not clear from the numbers whether the uncached pages were done on pressflow as well or vanilla Drupal core? Any chance of a comparison there?
It was Drupal core
Though I don't expect a huge amount of variance from Pressflow. This isn't making use of any other advantages (e.g. mysql replication, etc) and the actual code differences to run Varnish-friendly headers, etc, are small. Shouldn't really impact performance much at all.
There is internal work at
There is internal work at Four Kitchens to deliver most page components through ESI, which should considerably accelerate pages for even authenticated users.
Ooooooh!
That would be like block caching on steroids.
Yep, I'm building it
Yep, I'm building it directly into the block caching system. Blocks are the perfect units for integrating with ESI. The extra win is that pages can have long cache lives even if some blocks (like the "latest forum posts") don't.
ESI? Can you fill me in on
ESI? Can you fill me in on what that is? Sorry if it is a basic question...
Edge-side includes:
Edge-side includes: http://en.wikipedia.org/wiki/Edge_Side_Includes
Exactamundo!
Exactly that lets you keep static-y pages around for a long time, and also gets you out of a bunch of active-cache-invalidation pickles (potentially).
I'm hoping we can get something equivalent to the block cache system into Panels as well, and integrate that the same way.
Ajaxify Regions
New module that takes the pain out of using authcache, http://drupal.org/project/ajaxify_regions. Using this one could use boost for limited support of an authenticated page cache. This could merge boost & authcache into one. It's also very applicable to Edge-side includes as the Ajaxify Regions module already slices up the page. It seems like we are starting to converge towards a page cache with the "extras" retrieved in a second request.
WoW!
With every comment it sounds better and better! Now this thread/group is my start-page =)
~Grace
Hot graphs
I did a graph to help promote this on my blog. Came out looking HOT!
Could you please share with
Could you please share with us what flags you used for ApacheBench when you got these numbers?
I'm doing some experimenting on my own server just to see how it compares. I've never really played with Apache Bench before, and I'm noticing that, for example, my "Requests per second" value increases by over 100% if I use AB's "-k" flag to use KeepAlive connections, without making any changes on the server. Clearly the flags used when AB is called are significant.
Don't use keepalive
You probably don't want to user keepalive in these tests, as that causes each concurrent thread to maintain its connection, which is not the case in real-world traffic results. The simple syntax I used is:
ab -c {concurrency level} -t 60 http://domain.comThis is a bare-bones/brute-force benchmarking command. Do NOT run at -c 100 (my last result) unless you know what you are doing, and NEVER run benchmarks against production servers. This is an easy way to accidentally crash a box.
Please use bechmarking tools with caution. Hitting an un-tuned server with 100 concurrent threads from apachebench can easily send the system into swap, , especially on a server-to-server connection where networking wont impose a de-facto cap. Id this happens, you may need to wait several minutes (or hard reboot) to recover. Start with one thread and ramp-up from there.
You probably don't want to
Do you think so? Some quick research reveals this Wikipedia article which implies that the major browsers have supported persistent connections for some time, and this one states "In HTTP 1.1 all connections are considered persistent, unless declared otherwise."
The objective is to
The objective is to replicate multiple users grabbing a single page simultaneously, rather than a single user grabbing the same page repeatedly.
Hence the need to treat each request as separate.
Alan
Well, ideally, we'd be
Well, ideally, we'd be simulating a user downloading a page, plus all linked CSS and JS files and embedded images and so on using KeepAlive, just as a web browser would. Apache Bench is only downloading the page, then, and not the linked/embedded files?
Maybe I jumped to conclusions on how AB works. But docs on Apache's site are sparse, and it doesn't seem to have a man page on my machine…
EDIT: Scratch that; it has a man page. (I typed "man ab" in the wrong terminal tab previously…) However, the man page is exactly the same as the page linked above… Little info on how it actually works.
And ideally...
If you're running this in the kind of environment where 2000 req/sec is a going concern, nothing but the drupal page would be coming off this layer. Everything else would be in a CDN. ;)
There are other tools we can use here. Jmeter allows you to easily fetch all related files from an HTML page, for instance. I am using ab in this case because running a jmeter test from my desktop maxes out my local DSL well before we start testing the virtue of the Mercury stack.
Additionally, static content
Additionally, static content would be trivially cached in Varnish as well, even with no CDN.
Test PressFlow on Acquia Hosting?
I'd love to take PressFlow for a spin on Acquia Hosting to see how it scales horizontally on AWS. Something we could experiment with.
Definitely
I've been talking a bit about this w/Kieran and Chuck. I think it can help a lot w/the Fields project, as putting this in on a shared cluster type system would be a really big win.
One of the near-term todos is getting at 64bit version of this out, which is important for doing these in higher-end production environments, but please feel free to run w/it.
Durr, PressFlow, not Mercury
Of course, you can run PressFlow right now. I'm sure the engineering squad over there is more than capable of configuring Varnish on their own. :P
I discussed it with my team
I discussed it with my team at Acquia and we'll work towards adding PressFlow to our list of distributions/applications to run continuous benchmarks with. We probably won't be able to work on this until after DrupalCon, but we'll report back as soon we have some data to share.
Hello Josh, What kind of
Hello Josh,
What kind of security levels will Mercury have? Recently one of my non drupal sites was hammered with a DDOS attack, and that was when I learned how nasty they can be.
Reading up on Amazon AWS I notice you should handle this in your application layer:
http://developer.amazonwebservices.com/connect/message.jspa?messageID=13...
http://developer.amazonwebservices.com/connect/message.jspa?messageID=13...
Review Critical
ClipGlobe - World Travel
DDOS is rough
It's very difficult to deal with a true DDoS attack. If initiated from a botnet, it's virtually impossible to distinguish the DDoS from "real" traffic. I"ve spoken with the aiCache folks a bit, but have yet to use their tool on any live sites yet. They have some benchmarks which show even higher performance than Varnish, and if they offer nice tuning options for DOS situations, that would be another thing that justifies their commercial price tag.
http://www.chapterthree.com | http://www.outlandishjosh.com
Nice article. I'm just about
Nice article. I'm just about to start running benchmarks against one of my sites and I'm just trying to get to grips with tools like ab and httperf. I'm interested to know why you increased the concurrency in your ab command each time you ran a different test? I would have thought that you would keep the test conditions the same for each step of the process, otherwise, how can you compare one set of results against the other?
Tom - www.kirkdesigns.co.uk
Increasing Concurrency
I did start by running the same concurrency levels, but was looking to "max out" each stack configuration. Basically, the -c flag for ab is a rough metric for tuning how hard/fast it benchmarks. The critical metric for knowing how well your stack will hold up is requests/second.
This is also the critical metric to try and determine from your existing traffic stats. Typically site-owners talk in "visits per month" which is unfortunately almost meaningless when trying to assess performance. What you need to try and figure out is your peak throughput. A site may have 10M visits a month, but the odds that they're evenly distributed through that time are slim-to-nil.
http://www.chapterthree.com | http://www.outlandishjosh.com
Care to share your jmx?
We (at Acquia) are about to start some D7 benchmarking, and would love a leg up by looking at your jmx.
Did you handle authenticated users in these tests? What about writes?
Get in touch :)
-Jacob
It's ab!
It's apache bench at the moment, but let's collaborate on this stuff.
http://www.chapterthree.com | http://www.outlandishjosh.com