There's been some great work on the Boost module lately with a lot of functionality added, especially the pre-caching crawler. And combined with nginx's speed advantage over apache serving static files, it's a great combination.
However, nginx does have built in caching. I've never used it, but I've seen some folks discussing how to do it on the mailing list. Has anyone used it for Drupal?
Like a lot of folks, I was drawn to nginx because of its superiority in low memory situations like a 256 or 360MB Xen slice from Slicehost or Linode, so I'm not sure that I have enough extra memory for it to help. And even with the static files created by Boost, there should be some OS level memory caching. Also Boost has superior options for expiring pages.
But I'm still curious if anyone has any experiences to share or has an nginx config for nginx native caching. Thanks.
Comments
@brianmercer
Yes, we are using ncache and still testing different live setups, to compare with Varnish performance and Pressflow compatibility. I think we will be able to share the results in the next two weeks.
~Grace
There's a long thread about
There's a long thread about adding the proper headers to allow reverse proxy caching here:
http://drupal.org/node/147310
Yes,
It is also often mentioned in the High performance g.d.o.. Pressflow has everything we need to use reverse proxy cache. Now we need only some good benchmarks to compare Varnish and Ncache results.
This is my preferred thread there: http://groups.drupal.org/node/25617
~Grace
I noticed the post at the end
I noticed the post at the end of that thread by David Strauss from Four Kitchens. I'm not ready to move to Pressflow, but I'm definitely interested in the D7 patches from that thread that he's backporting to D6. I know you're using Pressflow so you're ahead of the game.
With nginx it's going to be a little different than Varnish with Pressflow. There shouldn't be any reason to cache static files with nginx like there might be with Varnish since nginx should be able to serve them directly with comparable efficiency. We're not using Apache as a backend behind Varnish, and OS level caching should suffice for static files served by nginx directly.
I don't know much about the ncache project, but it also seems to be more geared towards caching a backend server like Apache and most of the caching functionality has been incorporated into the nginx main branch since about 7.50.
So then what I'm interested in then is just caching the dynamic requests that are proxied to fastcgi. However I'm still trying to understand the settings. Here are some random ones:
fastcgi_cache_path storage/cache levels=2:2 keys_zone=cacheresp:50m inactive=25m max_size=2000M;
fastcgi_temp_path storage/temp/;
fastcgi_cache_valid any 10s;
fastcgi_cache one;
fastcgi_cache_key $request_uri;
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_cache_valid 200 302 3d;
fastcgi_cache_valid 301 7d;
fastcgi_cache_valid any 1d;
fastcgi_buffers 16 4k;
# For X-Accel-Redirect caching you also need to pass this
# header to upper proxy instead of processing right now.
# In nginx 0.8.7+ this can be done via
# fastcgi_ignore_headers/fastcgi_pass_headers. In
# older versions (including 0.7.61) use another header
# name and add_header directive instead.
fastcgi_ignore_headers X-Accel-Redirect;
fastcgi_pass_headers X-Accel-Redirect;
proxy_cache_key "http://cacheserver$request_uri $cookie_name";
It looks like a purely time based cache would be easy, but that communicating with the fastcgi server using headers will expire the pages in a more dynamic fashion. I'm still trying to understand that thread and how Drupal needs to be patched to modify the headers and the extra cookie and then how to configure nginx to use the modified headers and/or cookie. Maybe it would work to first understand the Varnish config and then translate that to nginx, but I have no experience with Varnish.
I'm using my usual method of staring at the mailing list threads until it makes sense. If anyone can help me along, please post.
Some results
I've had a chance to play with nginx native caching a bit and so far I've come to these conclusions:
The Patch
The dev team has worked up a patch to Drupal core that is designed to make Drupal friendly to the Varnish and Squid caching reverse proxies. It makes a few helpful changes such as:
among other things.
Note that removing session cookies for anonymous users is going to break the functionality in a few modules, such as letting an anonymous user put items into a shopping cart before they log in to purchase.
The development of the patch and the reasons for the changes are explained in this thread http://drupal.org/node/147310. The patch has been developed for D7 and committed there, but it has also been backported by David Strauss at Four Kitchens for incorporation into Pressflow (optimized version of drupal) and the backport is now used on d.o (see http://nnewton.org/node/9). The patch for D6 can be obtained from Four Kitchens' public versioning system using a bzr client and minimal diff/patch skills.
I've applied the patch for testing with Nginx.
Using native Nginx caching
Recent versions of nginx, including the latest 7.61 and above, feature integrated caching of responses from backend servers. Whether you're using an Apache backend behind Nginx, or a special server, or just using Fastcgi for php as a backend, nginx can cache the responses from those backends as static files on the file server. Caching in RAM is a horse of another color and is achieved using memcache, which nginx also supports but is not the subject of this post. Nginx will take a client request, check to see if it has a static file in its cache for that request and serve that static file if found. I don't know if it buffers the keys in memory or makes file system checks. If it is not found, it will continue to pass the request to the backend server and then as it serves the file back out to the client, it will create a static file for that request in its cache. The file is saved on the file system with an md5 hash. The file will be saved and served for a set period of time and then flushed.
The settings for a simple system that caches everything and flushes it after 10 minutes is easy enough:
http {
...
fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=two:10m inactive=5m max_size=500m;
...
}
server {
location ~ .php$ {
...
fastcgi_cache two;
fastcgi_cache_key $request_uri;
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_cache_valid any 10m;
fastcgi_pass_header Set-Cookie;
...
}
That defines a caching zone named "two" that saves up to 500mb worth of static files at /var/cache/nginx. "Levels" defines the depth of the directory structure used, to avoid such things as 32k directory limits and should be adjusted based on the number of pages to be cached. The caching mechanism names the static files using the URI as a key (md5 hashing them), caching "any" responses (or you can limit it to certain http response codes, i.e. 200, 304, etc.), and purges them after 10 minutes. Nginx respects the backend response headers such as Cache Control or Expires when those headers say not to cache. Since those headers are created by Drupal to instruct the client browser, Nginx must be told to ignore those headers. Nginx must also be told to pass Set-Cookie, since it doesn't do so by default with caching.
Now with Squid or Varnish you are also going to cache static files (i.e. css files, js files, images, or whatever). The reason for this is that Squid and Varnish have shown themselves to be much faster at serving static files than Apache with mod_php, so rather than allowing Apache to serve those static files, they are cached by Squid or Varnish and served by them (without any php overhead that is included in every apache thread when you use mod_php). However, Nginx with fastcgi/php is also much faster than Apache at serving static files, and by some benchmarks just as fast as Squid or Varnish. Your own benchmarks on your production system are the only way to tell for sure if Nginx will serve static files for you as fast as Squid or Varnish. For my purposes here, I'm not caching static files, but using Nginx to serve them directly. All I'm caching with this config are the dynamic responses from the fastcgi/php backend.
Now if you're using Drupal only to serve anonymous users, and you don't mind your content updating every 10 minutes, this setup is fine. However, once you log in, Nginx will keep serving you cached files so you won't be able to see your edits. Nginx will also cache all your admin work, which would be a waste of cycles and storage and also creates a security problem. You could setup an admin subdomain if you only have a few editing admins, but most Drupal installs allow basic authenticated users to login to post and do their business, so this is not acceptable.
So the patch provides for the NO_CACHE cookie when people log in, and provides settings for Squid and Varnish to serve fresh pages if you have that cookie and to not cache any pages requested by a person with that cookie. Unfortunately, Nginx cannot do this at the moment. There may be some hackish way to configure it using if{} statements, but Igor (the developer of nginx) recommends something else. He recommends adding the cookie value to the key that the cache uses, so that when a person with the NO_CACHE cookie hits nginx, their url+cookie key will miss the cache which is based on url+empty_cookie_value and pass the request to the backend. This would work, except that the cache would then cache the new url+cookie requests as well, and when the person logged out, they would still get the logged in pages that we don't want in the cache in the first place. Igor's solution to this is for the php script (i.e. Drupal) to serve an additional http header with logged in requests to tell Nginx on the way out not to cache those requests. Igor suggests the X-Accel-Expires header with a value of zero. Nginx will respect that header, just as it would respect Cache Control or Expires, except that we're not ignoring it, so that responses from Drupal that are marked with that header will not be cached. Since the logged in responses will never cache, the uri+cookie requests will always miss the Nginx cache and be passed through.
Unfortunately, the patch was not targeted at Nginx, so no such additional header is included. Squid and Varnish don't need it. Since I'm not a programmer (I'm clearly in a much more long-winded profession), I haven't attempted to add the header to the patch. It seems like it might require only a line or two in the correct spot. If someone would like to offer the appropriate patch, I'd be happy to test it.
Native Nginx caching or Boost
Finally I have to ask myself if caching is really worth the trouble with the existence of a great module like Boost if you're using Nginx. The fact is that Drupal+Boost with Apache+mod_php still isn't fast enough. Boost creates the static files, but you're still serving them with Apache+mod_php. Varnish or Squid in front of Apache+mod_php serving their cached files is going to blow the non-proxy setup out of the water because Varnish and Squid are so much faster than just Apache+mod_php at serving static files.
However, this may not hold true if you're using Nginx, because Nginx is also very fast as serving the static Boost files.
Boost also has several advantages. Since it works at the Drupal level, it can provide options like excluding specific pages from the cache, and expiring cache pages when they're edited. Boost can create two versions of every page, one uncompressed, and one pre-compressed with gzip, and Nginx can decide which to serve based on the capabilities of the browser. Precompression can save CPU cycles over on-the-fly compression and can allow you to use the highest and most CPU costly level of compression. Boost also includes a pre-crawler that can create your entire set of cache files during cron jobs. Of course, you could write your own cron-activated crawler to touch files through Varnish, but Boost includes a sophisticated Drupal-specific one that uses the the Drupal path alias db. Boost accomplishes some of these feats by adding some db overhead while managing the cache, but the reduced load on the database by no longer using database-based caching for anonymous users is substantial.
So the question becomes, is native Nginx caching preferable to Boost caching at all if you're using Nginx? The way to answer that question is with extensive benchmarking, which I have not undertaken.
Any thoughts?
Edit: Fixed some formatting errors caused by my non-understanding of the Markdown filter and failure to use the Preview button.
Boost vs Varnish
That would be one heck of a fight... make sure varnish is serving gzip & the test bot is requesting gzip, so we are comparing apples to apples. Speaking of gzip, with
Aggressive Gzip: Deliver gzipped content independent of the request header.turned on, boost kicks some serious butt. I don't think that section of the htaccess rules have been ported to ngnix yet.It's fairly simple.
# Gzip Cookie TestRewriteRule boost-gzip-cookie-test.html cache/gz/boost-gzip-cookie-test.html.gz [L,T=text/html]
http://drupal.org/node/528506
Working nginx caching
I made a tiny module to insert the header that nginx requires. My current config is:
http {
...
fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=two:10m inactive=1d max_size=500m;
...
}
location ~ .php$ {
...
# Cache Settings
fastcgi_cache two;
fastcgi_cache_key "$host$request_uri$cookie_NO_CACHE";
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_cache_valid 200 301 302 1h;
fastcgi_pass_header Set-Cookie;
}
I'm a little sad to lose the redirect to clean urls of the Global Redirect Module, and I need to check to see if Google Analytics cookies are being set. But generally it works correctly and quickly with the combination of patch, http{} and location{} config, and tiny module.
Module Change
I changed the module to the way it should be:
- if (!empty($user->uid)) {+ if (isset($_COOKIE['NO_CACHE'])) {
So that people who log out but still have their NO_CACHE cookie won't create new cache pages.
Patch
Create a patch; and do an OR
if (!empty($user->uid) || isset($_COOKIE['NO_CACHE'])) {Post it back on the boost issue queue.
NO_CACHE cookie is for nginx only correct?
Not a patch for the Boost module.
Not a patch for the Boost module. A patch for my little nginx_header module to send the extra header for the reverse proxy patch.
BTW, I did some quick ab benchmarking and nginx was serving the boost static files about 15% faster than from the native cache. I'm not sure why, maybe because of the md5 hashing of the file names, or the on-the-fly gzip compression? Beats me. Will do some more testing.
Tiny header module unnecessary
Edit: You do still need the nginx_header module and the "fastcgi_pass_header X-Accel-Expires;" to handle the NO_CACHE cookie.
You don't want the "fastcgi_ignore_headers Cache-Control Expires;" or "fastcgi_cache_valid any 10m;" in the config.
You need to set caching to aggressive (normal won't send the header), set "Page Cache Maximum Age" to the desired expire time, and something like the following to catch the cookie and it will give the correct expire time in the Cache Control header and Nginx will honor it.
location ~ .php$ {...
# Cache Settings
fastcgi_cache two;
fastcgi_cache_key "$host$request_uri$cookie_NO_CACHE";
fastcgi_pass_header Set-Cookie;
fastcgi_pass_header X-Accel-Expires;
}
Sorry for spamming the group mailing list.
Still problems.
Nope. Still problems. I suspect it has to do with aggressive caching. Since it won't send the proper Cache Control: max-age=x without being set for aggressive caching, and since aggressive caching disables hook_boot and hook_exit, my module for the NO_CACHE cookie can't send the X-Accel-Expires header. Might have to go back to ignore_headers and cache_valid.
Seems like caching on groups.drupal.org here is also messed up. Go to a page, hit a login link, enter credentials, get redirected back to the original page but you're not logged in. Hit login link again and get Access Denied because you're already logged in. Head back to the page and now it's logged in correctly.
Very interesting, How can i
Very interesting, How can i subscribe to thread?
updates
Any news on how nginx+boost stands up to varnish?
none
Sorry, none here. I took a quick look at some benchmarking tools, but haven't followed up with it. If we were to set up one of those Amazon ec2 instances and add nginx to it, what tools could we use to get a good comparison?
ab?
Can't go wrong with ab, right?
ab -n 10000 -c 100 http://www.example.com/http://en.wikipedia.org/wiki/Web_server_benchmarking
I get the feeling ab stops
I get the feeling ab stops being as useful once you move to a caching solution. Serving one cached file to one agent might show the performance of your hardware and TCP stack rather than the proxy. Maybe something like this http://curl-loader.sourceforge.net/.
AB tests
OK, some down and dirty tests.
This is on my home server, a single core AMD Sempron Processor 2800+, CPU Speed 1.61 GHz, 1Gb of RAM. It's running Ubuntu 9.10 up to date, Apache-Prefork 2.2.12/mod_php 5.2.10, Nginx 8.19 with fastcgi/php 5.2.10 and Varnish 2.04 with the default settings. I've got the latest Pressflow 6.14.56 and Boost 6.x-1.13, running about 44 contrib modules on my home-made theme, though none of that matters since it's all just bits in a cache for the purposes of this test. The size of the page is 14393 bytes.
I'm running the ab line 5-10 times from the web server itself until I can eyeball a number of requests per second that looks typical. I'm clearing the cache between types of run.
Starting with no Drupal caching on Apache using the following ab line:
ab -n 100 -c 20 http://home.brianmercer.com:8080/blogpost/testing-inline-moduleI'm getting right about 7 r/s. Dismal.
Running the same test on nginx
ab -n 100 -c 20 http://home.brianmercer.com:80/blogpost/testing-inline-moduleI'm getting more like 7.3 r/s. Interesting for those folks proxying Apache through Nginx.
Now stepping up to Drupal "Normal" db caching:
Same ab line, Apache is right around 73 r/s and Nginx is at 83 r/s. Still pretty close.
Interesting fact, Drupal (or at least Pressflow) keeps a different caching setting for each web server. Did not know this. Had to go in and change the setting twice, once on each server.
OK, lets go to "Aggressive" caching:
Same ab line Apache is about 115 r/s and Nginx is at a steady 127 r/s. Pretty good.
Now the fun begins. Apache on Boost. That's Drupal db caching disabled and Boost caching enabled. Prime the cache, run the same ab line a few times. The result? Apache on Boost: 544 r/s. Cool. We can easily step up ab to 1000 requests, keeping the same 20 concurrency, and it stays the same. Right around the 547 r/s mark.
So what about Varnish proxying Apache? Here's the line:
ab -n 10000 -c 20 http://home.brianmercer.com:6081/blogpost/testing-inline-moduleSure, a quick 10,000 requests to port 6081, the default Varnish port on Ubuntu. I've got it proxying to port 8080 on which I'm running Apache, though once it's in the cache, it shouldn't matter where it came from.
I set my Boost caching to disabled, clear the cache, set Pressflow caching back to Aggressive, and I've set the minimum cache times to 10 minutes each. Note: if you're not set to Aggressive, Pressflow sends a max-age:0 and Varnish will not cache the result. (Drupal default doesn't send these headers at all, but you can patch Drupal to do so by extracting the patch from Pressflow.) So I'm set to 10 min and firebug shows me a nice
Cache-Control public, max-age=600and what is the result? With Varnish serving the cached page it's right about 3208 r/s. Big step up. Very nice.
Now in this very thread I worked out how to use Nginx native caching, so it wouldn't be fair not to test that while I'm here. That requires me to uncomment my caching lines in /etc/nginx/sites-enabled/drupal-boost and then do an /etc/init.d/nginx reload. Done and done. Using the same ab line as Varnish:
ab -n 10000 -c 20 http://home.brianmercer.com:80/blogpost/testing-inline-moduleand the result is... right around 3054 r/s. Not as good as Varnish. I'm disappointed, since I'm an Nginx fan, but can't argue with the results. Varnish is running about 5% faster than Nginx from the native Nginx cache. A quick look at top while running both Varnish and Nginx tests shows that varnishd is occupying around 80Mb of memory and Nginx is occupying 3Mb of memory. Both spike from 25 to 50% CPU while running their 10,000 requests.
Finally, the one that mikeytown is waiting for. How fast will Nginx serve Boost files? Gotta comment out the Nginx caching lines again and reload. Disable Drupal db caching. Enable Boost caching. Run it once to prime the Boost cache. The results varied a bit and weren't as consistent as the other numbers. They came in between 3250 and 3550 r/s with the most common number being right about 3410 r/s. Yes, that's right. Using Nginx to serve from the Boost cache is actually faster than Varnish. Why? Dunno.
Varnish saves pages in it's monolithic 1Gb file located at /var/lib/varnish/[servername]/varnish_storage.bin. Nginx saves each html page separately in an md5 hash named file in a designated directory, which for me is /var/cache/nginx, and runs a cache manager thread that saves the hashes in memory. Boost saves each html page separately with its own file name with a _ on the end in the [html_root]/cache/[http_host] directory. Despite them all being in various files, I'd say that they're all "in memory" since they all let the OS handle whether to cache a page in memory. I'd say in each case after the first couple thousand requests or so that Linux gets the message that this is a popular file and should be kept in memory.
So why the difference in the speeds of serving the same file, off the same machine?
Interesting results. I'm sure mikeytown will be pleased. Now we just have to get one of the Mercury guys to install nginx and boost on their Amazon EC2 instance and run the tests there. It should be interesting as well!
I'd also be interested to see if anyone has used any different benchmarking scripts that simulate real world users more closely.
Results aren't quite right
varnish + Apache vs. nginx + native caching
instead of
varnish + nginx vs. nginx + native caching
a) ab didn't fetch the actual page
b) varnish's cache wasn't being tested
You can see what's going on with varnish via varnishhist (graph of hits and misses) and varnishlog.
I'm sure you'll find that varnish outperforms both nginx native caching and nginx + boost.
Also, please attach the raw results from ab.
This was several months ago
This was several months ago and I later reran some tests using two Amazon EC2 instances instead of the single machine sending requests to itself.
Yes, with later tests I ran with keep-alives on.
I'm not sure what you're referring to or if I misstated at some point, but I ran three configurations: varnish+apache, nginx with native caching, and nginx with boost caching. Those were the setups in which I was interested. I never used Varnish in front of nginx.
I haven't been using cookies on anonymous pages for some time since I switched to Pressflow which includes the lazy session patch. Prior to my switch to Pressflow I was applying the backported patch to standard Drupal from the Pressflow bzr repo. I was using the Mercury preconfigured AMI on my later Amazon EC2 tests and used their setup for my home tests. I don't recall if they configured for cookies, but there shouldn't have been any cookies.
I don't recall any 301 redirects. I generally checked file sizes and response codes to make sure I was getting the real pages.
My later tests put Varnish at about 17% over nginx with Boost. I have no plans to conduct further tests at this time.
If someone develops a Drupal benchmarking project for comparative tests, I'd be happy to run the nginx portions. I think it would be worthwhile.
Hard Drive Latency
http://en.wikipedia.org/wiki/Access_time is interesting, guess we can assume when dealing with network latency the difference between memory and disk is minimal. Did you happen to get CPU & Memory load of ncache vs nginx+boost? Varnish uses 80mb of ram correct? I would guess CPU usage would be much lower when doing nginx+boost since it doesn't have to run the MD5 function 3k a second.
I linked your post to the front of the Boost page, hope you don't mind :)
I didn't look specifically at
I didn't look specifically at memory usage of nginx caching vs nginx+boost. (ncache is a different, older project and probably little used since caching has been integrated into nginx) However, I've never seen nginx use more than a few MB. I think I'd need a different test method to actually measure CPU usage. I just glanced at top to see if there was a hugely noticable difference in CPU usage between Varnish and Nginx, and there wasn't. They both fluctuated in the same range, and both spiked up to 50% at times. AB itself spiked up to 30%, so the only conclusion I could draw was that both used up most of the available CPU power to get the job done.
I'm not sure how Varnish works. You set a cache size for a monolithic cache file on the disk. They recommend that you make that large enough for every page and resource on your site, regardless of your RAM. Then the OS manages how much goes into memory. However, when I ran the default varnishd it ran two processes. One was small, only about 2MB, and the other was the largest process on my system at 80MB. Since I loaded only a couple pages throughout the entire test, and only one or two through Varnish, I can't imagine what was occupying 80MB. Maybe someone who knows more about Varnish can explain.
I'm happy to support your work.
This post from Igor suggests
This post from Igor suggests that creating TCP connections becomes a bottleneck and tests should be run with keepalives on: http://forum.nginx.org/read.php?2,12472
So as long as it's still set up, here's three typical results.
Varnish:
bpm@c002:~$ ab -n 50000 -c 200 -k http://home.brianmercer.com:6081/post/inhibeo-brevitas-si-modo-usitas
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking home.brianmercer.com (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: Apache/2.2.12
Server Hostname: home.brianmercer.com
Server Port: 6081
Document Path: /post/inhibeo-brevitas-si-modo-usitas
Document Length: 38344 bytes
Concurrency Level: 200
Time taken for tests: 13.174 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 50000
Total transferred: 1939790672 bytes
HTML transferred: 1918427008 bytes
Requests per second: 3795.21 [#/sec] (mean)
Time per request: 52.698 [ms] (mean)
Time per request: 0.263 [ms] (mean, across all concurrent requests)
Transfer rate: 143787.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 4 102.6 0 3013
Processing: 5 49 12.0 51 102
Waiting: 0 32 8.2 32 83
Total: 5 53 102.5 51 3042
Percentage of the requests served within a certain time (ms)
50% 51
66% 53
75% 56
80% 58
90% 63
95% 69
98% 74
99% 77
100% 3042 (longest request)
Nginx with native caching:
bpm@c002:~$ ab -n 50000 -c 200 -k http://home.brianmercer.com:80/post/inhibeo-brevitas-si-modo-usitas
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking home.brianmercer.com (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: nginx/0.8.19
Server Hostname: home.brianmercer.com
Server Port: 80
Document Path: /post/inhibeo-brevitas-si-modo-usitas
Document Length: 38261 bytes
Concurrency Level: 200
Time taken for tests: 13.176 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 49551
Total transferred: 1934476396 bytes
HTML transferred: 1914771745 bytes
Requests per second: 3794.89 [#/sec] (mean)
Time per request: 52.703 [ms] (mean)
Time per request: 0.264 [ms] (mean, across all concurrent requests)
Transfer rate: 143381.21 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 32.9 0 3001
Processing: 7 50 200.5 29 3045
Waiting: 6 46 200.5 26 3043
Total: 7 50 203.7 29 3484
Percentage of the requests served within a certain time (ms)
50% 29
66% 33
75% 40
80% 46
90% 60
95% 71
98% 84
99% 90
100% 3484 (longest request)
Nginx with Boost:
bpm@c002:~$ ab -n 50000 -c 200 -k http://home.brianmercer.com:80/post/inhibeo-brevitas-si-modo-usitas
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking home.brianmercer.com (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: nginx/0.8.19
Server Hostname: home.brianmercer.com
Server Port: 80
Document Path: /post/inhibeo-brevitas-si-modo-usitas
Document Length: 38344 bytes
Concurrency Level: 200
Time taken for tests: 12.107 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 49558
Total transferred: 1937853389 bytes
HTML transferred: 1917315032 bytes
Requests per second: 4129.68 [#/sec] (mean)
Time per request: 48.430 [ms] (mean)
Time per request: 0.242 [ms] (mean, across all concurrent requests)
Transfer rate: 156302.84 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 100.5 0 3005
Processing: 6 43 174.2 28 2946
Waiting: 1 39 174.2 24 2940
Total: 6 46 213.0 28 4220
Percentage of the requests served within a certain time (ms)
50% 28
66% 31
75% 34
80% 37
90% 41
95% 48
98% 56
99% 70
100% 4220 (longest request)
Boost still the winner.
Memcached?
Nginx has native support for memcached. Did you try that configuration?
I think that may be better than even boost ...
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
I didn't try that, but I'm
I didn't try that, but I'm interested.
From what I understand, Nginx doesn't put things into memcached, but only takes them out. I would use cacherouter and tell it to put the cache_page table into memcached with a key tied to the URL. The cache_page table already uses the URL for its index.
Then I'd set Nginx to check memcache using the URL as a key before going to the fastcgi backend.
Does that sound correct? I'll look into it.
Memcached with Nginx
Something like this?
server {location / {
set $memcached_key $uri;
memcached_pass name:11211;
default_type text/html;
error_page 404 @fallback;
}
location @fallback;
proxy_pass cluster;
}
}
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
Yeah. The memcached key that
Yeah. The memcached key that cacherouter/memcache creates is along the lines of:
cache_page-http%3A%2F%2Fexample.com%2Fcontent%2Fquadrum-utinamSo it'd be more like:
set $memcached_key cache_page-http://$host$uri;but that doesn't work because Nginx looks for
cache_page-http://example.com/content/quadrum-utinamso then you'd either need to hack cacherouter/memcache to remove the "urlencode()" or use Nginx embedded perl to "uri_unescape()" (see also http://drupal.org/node/525400#comment-2196908), and cacherouter/memcache saves to memcached with the headers:
a:1:{s:12:"Content-Type";s:24:"text/html; charset=utf-8";}and about that time I decided a real programmer needed to take this on if it was gonna happen.
But I agree that it looks worthwhile.
Memcache direcctly from Nginx
We've successfully done this with a minor fork of the memcache.inc, and we get fantastic response times. From another server in the same data center, we get response times on the order of 15-30 msec (this is on a live server under normal load). Our performance tests saturate the network (~3500 concurrent connections) before putting any strain on the server.
Blog: http://technosophos.com
QueryPath: http://querypath.org
Awesome. I can't wait to
Awesome. I can't wait to hear more about it.
I am interested to hear more
I am interested to hear more about this approach.
Please, can't you give us
Please, can't you give us more details on this approach?
I'm very very interested in this...
it sounds great. can you
it sounds great.
can you share your config files with us?
bennos
Stream Wrapper
There is some interest in this; should be attainable if using a stream wrapper.
http://drupal.org/node/563576#comment-1997172
This is a low priority for now, but in terms of running boost in a larger server cluster then this is where it could really shine.
OK, I see that what the core
OK, I see that what the core caching+cacherouter puts into the memcached bin isn't directly usable, but it's pretty close. Would just need to strip the junk at the beginning and write the key without the percent encoding, and nginx could grab it.
Could be a neat option for Boost.
I think static files + nginx
I think static files + nginx is the fastest possible configuration. You still serve from memory (remember there's kernel cache) and there's no overhead.
Still what kbahey suggests
Still what kbahey suggests would be great. Drupal caching pages into memcached and then nginx serving right from memcached without touching the file system. It would scale to multiple memcached servers and would give Drupal control over cache invalidation. And most of the work is already done in the memcached or cacherouter module.
But then the next step is to incorporate ssi into that scheme. Nginx native caching for the page shell with Drupal caching only panels/blocks into memcache and then nginx grabbing the cached blocks/panels from memcached and building the page. Much more efficient on memory and cpu when you're only storing a panel generated from a views query instead of an entire page, especially if you use that panel in multiple pages.
I'm afraid I don't have the skills for it though.
Yes, even without your "next
Yes, even without your "next step" it would still be great, because you don't need to implement tricky filesystem path etc.
And "next step" is great.
I may help with SSI tags implementation but I don't remember seeing project on D.o. dedicated to it. As far as I understand, joshk is building Panels cache plugin for it.
Bottlenecks
I'm not sure but it seems like both Nginx core cache and Varnish both hit the same bottleneck. I wonder if either project uses a LUT (look up table). In video processing (where I used to lurk before Drupal) using a LUT was a must due to the performance increase it would bring. Seems like for Nginx if it had a LUT of URL to MD5 then it could probably be faster; it may have one I don't know. Only downside is it's a performance memory trade-off; it would be very similar to rainbow tables if you have ever head of those. A 16 bit (65,536) table with a key value pair of 4096bits (URL) & 32bits(Hash) would take around 32MB of memory. Using a 8 bit (256) table would be another interesting idea, have it on some kind of stack so the last 256 unique requests are in the lookup table. Sorting the array for O(log(N)) performance or better would be a good idea. I would probably go for the 8bit table, due to the small memory requirement & fast sorting times.
I wonder if the hard drive bottleneck when using boost can be avoided by using an RAID of SSD drives. I know for real world usage where everyone doesn't hit the same page using a SSD would be a good idea. One more idea, putting the gzip files on a ram drive and the non compressed on the SSD would be a good trade off since the vast majority of the pages served will be to gzip enabled browsers.
Any other ideas on identifying and addressing bottlenecks; minus the obvious network one?
@brianmercer
Would you mind posting the configuration for these 3 configurations so other people can give Nginx a spin and to make sure we are using the best settings?
You're going a bit over my
You're going a bit over my head, but if I understand you, nginx may keep such a LUT in memory.
The Nginx cache setup requires that you define a "key-zone" for which you set a maximum memory size. I assume that key zone must store URLs and MD5 hashes. The size of the key zone is configurable and most default setups mention 10MB. The format is:
fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=cache:10m inactive=1d max_size=2g;with the key-zone name in this case being "cache" (you can define multiple zones) and the max size being 10m. The maximum size of the cache on disk is set here to 2g. Nginx will write a line to the log if you fill your key-zone so you know when to increase its size.
It does something similar for rate and connection limiting, defining a key-zone of a given size to store IP addresses, converted to binary to save space, so that you can set limits on concurrent connections and request rates to mitigate DDOS attacks.
I've been spinning up Mercury EC2 instances for fun and I read Greg Coit's ab/siege test method in the Pantheon group. I plan to mirror his testing on an EC2 instance for my own nginx cache and nginx boost tests. Then I can post my Nginx configs.
So best recipe of performance
So best recipe of performance of drupal site is use Boost+Nginx orr Varnish+apache?Who good for CPU,Ram(Ram with 1,3G)?I love apache+cpanel ,hate comma line ;-P
Unrelated to the
Unrelated to the benchmarking, but related to the original thread, I'm looking at the "fastcgi_cache_min_uses" directive to use nginx caching as a sort of "throttle" module. For ordinary low traffic, all pages are dynamic. However, if you get more than a certain number of requests for a page within a certain time frame, it caches the page and serves the cached version. In this example, it serves dynamically until the same page is requested 100 times within 10 minutes, then it caches the page for 10 minutes, then starts over. You get caching of your popular pages, but not all pages. Digg insurance?
location ~ .php$ {
try_files $uri @drupal; #check for existence of php file
include /etc/nginx/fastcgi_params;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
# fastcgi_pass unix:/var/run/php5-cgi.sock;
fastcgi_pass 127.0.0.1:9000;
# Cache Settings
fastcgi_cache cachezone;
fastcgi_cache_min_uses 100;
fastcgi_cache_key "$host$request_uri$cookie_NO_CACHE";
fastcgi_cache_valid 200 301 302 10m;
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_pass_header Set-Cookie;
fastcgi_pass_header X-Accel-Expires; # for testing only
}
You would still want to use Pressflow/patched Drupal and the Nginx Header Module.
I read all above post and one
I read all above post and one question got into my mine.
Varnish should work with Pressflow, stock drupal cannot work with Varnish, right ?
Then could I use a stock drupal to use ngnix + memcached combination ?
As far as I know, there's two
As far as I know, there's two modifications that make Pressflow (and D7) suitable for reverse proxies.
I know that with nginx you can get around the first one by having the cache ignore the "do not cache" header and set the cache time in your nginx config.
The second one is more problematic. How does the reverse proxy know which requests to serve from the cache and which to bypass and serve live? Right now it does that by bypassing the cache for anyone with a session cookie. If you don't have a session cookie, then it knows you're anon and gives you the cached page. I suppose there're circumstances you could make it work, but it'd be a pain.
So why not use Pressflow? Pressflow is good.
Note: memcached is usually used for data caching and is not usually involved in the full page caching that we're talking about with a reverse proxy.
I knew that memcached is used
I knew that memcached is used for data caching in order to off-load db server request, right ?
However I read some replies from people said that ngnix could talk to memcached's cache (a full page cache ?) for a http request without divert to drupal bootstrap process
It's doable, but it's an edge
It's doable, but it's an edge case. See http://technosophos.com/content/53900-speedup-nginx-drupal-and-memcache-... I dunno if he ever released his patches to the memcache module or his nginx config.
What I'm talking about is using either Boost or nginx native caching which also serves cached pages without touching the Drupal php backend. Both save the pages to disk files instead of keeping them in memory like memcached. But with OS filesystem caching, files are saved in memory to the extent that memory is unused, and OS file caches are quite efficient.
It depends on the scale of your application. If you've got one server hosting your site (or a second mysql server) then there's no advantage using memcached over using nginx's native file cache or Boost and letting the OS filesystem cache keep the files in memory. Of course if you've got a few 64MB dedicated memcached servers, then that's a different story. That's what memcached is all about.
subscribed to this topic
subscribed to this topic
subscribing
Subscribing to the topic.