Nginx + gzip + Boost

I have read about configuring nginx + boost for drupal.
I decided to install boost for my site.
Now I have nginx + php-fpm + APC + drupal.
On Drupal installed Cache Router with all cache in APC.
APC = 256Mb.

Could you help me to configure nginx + boost for best performance?

  1. Content is dynamic - news site, will boost correctly clear cache after news added, so changed block of views in all pages?
    Is it worth to cache for 5-10-15 minutes? Or configure boost to cache only "complicated" pages, like main page with many blocks?

  2. Boost can do "Aggressive gzip", is it possible with nginx? I thought if gzip on in Nginx, each request to the page (in cache or not) will be gzipped, Boost can store already gzipped html - it saves CPU.

  3. What is optimal numerbs for workers?

    user nginx nginx;                  
    worker_processes 16;               

    error_log /var/log/nginx/error_log info;

    events {
            worker_connections  1024;
            use epoll;              
    }                               

    worker_rlimit_nofile 16384;

Comments

1. Boost has code to clear

1. Boost has code to clear views containing nodes that have changed, but I doubt it will cover every situation. You'll have to test whether it works for your site.

Cache time depends on how often your content changes and how much traffic you have. If your content only changes once a day, then you may want a longer cache time. Also keep in mind that the cache is flushed on cron runs, so if your cron is only set to run every 30 minutes, then a 5 minute cache time will still only be flushed every 30 minutes.

Even if your content changes every 5 minutes, if you have thousands of hits during a typical 5 minutes, then you'd take advantage of caching more than if you only get 2 or 3 hits in five minutes. If you're only getting 2 or 3 hits every five minutes, then it might not be worth the overhead of flushing the cache every 5 minutes.

2. I believe that aggressive gzip does a javascript check of gzip compatibility instead of relying on the browser to send the proper gzip capability header. I'm not sure what percentage of users are using a browser that incorrectly reports gzip capability, but I bet it's fairly low in 2010. I don't use this feature.

3. Generally you want to set the number of worker_processes to the number of CPU cores (or virtualized CPU cores) on your server. So 16 is probably too high unless you have quite a high end multi-cpu setup.

Thank you for reply

Thank you for reply brianmercer!
1. Everyone dream on server that have reads and no writes. 8 hour in a day we create conent about 100 nodes - approximately 5-10 minutes for a node. 1 block just display recent news in almost every node so when node added all content becomes expired in cache.
In this scenario i will have many writes to cache, so why not to put this cache into ram to dispose of I/O on disk?
1 page cost about 50k in html, 8-10k in gzip. 256M cache / 60k = 4.2k pages more than enough for my site and cache 5-10 minutes.
Is it faster than use cache in APC for cached pages? How much faster?

  1. I'm sorry for misleading I mean what config should be applied to nginx to not gzip content of html files and take already gzipped html from cache as nginx gzip it and give this .html.gz to client?

  2. I have 2 Xeon E5620, fresh setup. Everything work fast, but some queries takes much time as I described in http://groups.drupal.org/node/84269 - mysql problem
    I tested the same setup on my C2D box, it works 10 times slower, so I would like to optimize everything before product.

It's definitely difficult to

It's definitely difficult to predict how it will perform.

As for putting caches into ram, that's a whole other issue. Here's some points:
1. OSes have their own filesystem RAM cache, so if nginx is requesting a Boost-generated cache file often, the OS will cache it in RAM if there is space, especially if you believe your full dataset will fit into memory.
2. If you use APC for the page cache in RAM, such as using cacherouter and putting cache_page in APC, (or memcached, which is slower but probably more stable) you're still accessing PHP backends, rather than Boost which just uses nginx to serve a static cache file directly, so Boost may still scale better and perform faster since you're not running PHP code and tying up your PHP processes.
3. You could move to Varnish instead of Boost. Varnish has a sophisticated control language and can be set to keep the cache in memory or use a file and rely on the OS filesystem cache.
4. If you're adventurous, you could look into using a hacked memcache Drupal module to put cached pages into memcached and have nginx serve them directly from memcached. See http://technosophos.com/content/53900-speedup-nginx-drupal-and-memcache-...

To serve gzipped Boost files, you can use the gzip-static module for nginx, which is already compiled into the nginx that comes with some distros. That module checks for an .html.gz version of a file every time an .html file is requested. Have you seen the nginx configurations maintained by yhager and omega8cc? What are you basing your nginx conf on?

A dual Xeon E5620 is beefy! I'm not sure whether an 8 core/16 thread system should set worker_processes to 8 or 16. If you have high traffic, you may also want to increase worker_connections beyond 1024. I'd say search the nginx mailing list http://forum.nginx.org/list.php?2 and ask the questions there if you can't find the answers you need.

Just curious, what OS/distro/version are you using?

Cache into RAM I agree with

  1. Cache into RAM
    I agree with requests i.e. reads, it will fit into filesystem cache. Also I have hardware raid with 512Mb cache DDR2 and delayed write turned on (with battery supplied).
    I didn't test whether this cache will fit in this hw raid cache, or OS filesystem cache.
    When boost will clear cache (1 view has changed), it starts to write in the cache. I prefer not rely on whether this writes will fit in HW cache or not.

I can create none  /var/www/example.com/htdocs/cache tmpfs  size=256M,nosuid,mode=1777,uid=nobody,gid=nobody   0 0
and all cache will fit into RAM.
Just to be sure cache is ok I can simply run rm -rf in cron.

  1. I have tested performance of nginx for my case to serve static files.
    Now my index.html.gz weights 8k. I have two ports of Gigabit ethernet, but in data center it will be connected to 100Mb. And in my test case host and client connected via 100Mb switch. Even if I choose to use CDN or use second ethernet port for images and heavy content I will have bottleneck in my connection.
    Here is result of # ab -n 10000 -c 100 -k http://192.168.3.25/.html.gz from another host:
    Server Software:        nginx/0.7.65
    Server Hostname:        192.168.3.25
    Server Port:            80

    Document Path:          /.html.gz
    Document Length:        8032 bytes

    Concurrency Level:      100
    Time taken for tests:   7.013 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    9900
    Total transferred:      82894620 bytes
    HTML transferred:       80336496 bytes
    Requests per second:    1425.86 [#/sec] (mean)
    Time per request:       70.133 [ms] (mean)
    Time per request:       0.701 [ms] (mean, across all concurrent requests)
    Transfer rate:          11542.56 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.2      0       2
    Processing:     7   70   3.3     70      85
    Waiting:        1   69   5.3     69      71
    Total:          9   70   3.3     70      86

    Percentage of the requests served within a certain time (ms)
      50%     70
      66%     70
      75%     70
      80%     70
      90%     70
      95%     70
      98%     71
      99%     72
    100%     86 (longest request)

So it's about 1.5k connections / sec and 0.7msec for request.
Transfer rate: 11542.56 [Kbytes/sec] received = 11542.56 x 8 = 92340.48 is very close to 100Mb.

On localhost it takes

Requests per second:    22521.81 [#/sec] (mean)
Time per request:       4.440 [ms] (mean)
Time per request:       0.044 [ms] (mean, across all concurrent requests)
Transfer rate:          182748.67 [Kbytes/sec] received

But this stats are useless for me as i would not have gigabit ethernet port.

About Varnish. Nginx is very fast to deliver static content and use of Varnish would be more complicated for me. And I see no benefits to use it compare to Boost + RAM.
Also, APC is for all precompiled php + all cache tables in drupal. If I will have cache_page too big, it will wipe out this data. Consider I could have more 3-5 domains on drupal, this domains will use the same APC pool.

Thank you for advice on use gzip-static. I have Gentoo system generally of its portage system. So to turn on gzip static I just typed USE="gzip-static" emerge nginx. Also I have installed PHP 5.3.3 with php-fpm in core to not apply patches php-fpm. This release have many bugfixes to 5.3.2 tree as well. Now testing. Portage system is very flexible to use mix of stable software and bleeding edge. We have 3 servers successful running about 3 years on Gentoo system.

boost config I will use omega8cc http://groups.drupal.org/node/26363 located on boost project.

Could you explain me about this static-gzip?
When should be used .html.gz and when plain html?
Is it rely on boost-gzip cookie when I turn in Aggressive gzip? Or by default it uses http header Accept-Encoding: gzip,deflate to decide use compression or not in case of static-gzip?
What rewrite rules should be applied?
Does googlebot accept gzipped pages? What about other browsers that do not use javascript, iframes?

EDIT:
I used config provided by omega8cc. Thank you for this config!
I added to server section

gzip          off;
gzip_static          on;

Is it required?

Do I need to add gzip          on; to location ~* .php$ and location @uncached or drupal will gzip it itself?

Without gzip          on; to

Without gzip          on; to location ~* .php$ content is not gzipped by drupal.
However Gzip page compression (Boost & Core) enabled on /admin/settings/performance/boost
and /admin/settings/performance too.
Content-length = unzipped size
content-encoding = gzip is missing.

Yes, you want both gzip

Yes, you want both

gzip on;
gzip_static  on;

Turning gzip off was just for testing purposes. You want both on for regular use. Turning them on globally in nginx.conf is fine.

None of our nginx configurations support Boost's "Aggressive gzip" checking. You either should turn it off, or write new support for it.

nginx uses the client header to decide whether to send gzipped pages both with regular on-the-fly gzipping and with gzip-static.

Aggressive gzip

mikeytown2's picture

I'll be changing this for the 1.19 release; working on the "if the cookie is there send gzip" part is ok; the redirect is what I'll be changing. I plan on using ajax instead of an iframe & i'll have a hook_menu as backup in case the file isn't there.

No idea how we'd add in that

No idea how we'd add in that check. If boost-gzip cookie exists then serve _.html.gz. I see no elegant way of doing that with the current config.

So we should check both for

So we should check both for that cookie and maybe go back to the old style with if (-f file) instead of try_files for .gz file then? It shouldn't be hard. At least for testing if the results are worth the additional complexity introduced.

[EDIT] I'm complicating it too much probably, we should be fine with adding another check if ($http_cookie... and maybe separate error handler or something to make sure both cookie is set and .gz file exist. Not sure if that makes any sense, just need to test it =)

Re: 1

I still need to test how it works with pre-gzipped and stored in cache by Boost files, but it appears this Nginx module will handle that: http://wiki.nginx.org/NginxHttpGzipStaticModule. It needs to be enabled on server build by using ./configure --with-http_gzip_static_module and then by using gzip_static on; in the Nginx configuration file. In fact, I have always used this in my config, just didn't test it with my rewrites for Boost, contributed initially by Brian Mercer. Or maybe Brain tested it already and I overlooked that? Brian?

[EDIT] Brian already mentioned the gzip_static :) http://groups.drupal.org/node/84519#comment-262879

I did test it when we first

I did test it when we first worked on the config. I did something simple like replaced the _.html.gz file with an empty file and see if I get a blank page. It did work as expected.

Edit: it looks like I also did "gzip off" in nginx.conf and then tested to see whether I got gzipped pages only for Boost pages. That showed success also. http://drupal.org/node/244072#comment-1747888

So it seems we are already

So it seems we are already fully compatible with Boost with enabled pre-gzipping if the Nginx version used has gzip_static available and enabled. Thanks!

With a site with many views

jcisio's picture

With a site with many views and many node, it takes forever to clear cache in db (each modification needs dozens of thousands of row deletion in each views). In such sites, I recommand to turn off all db management, just use cron to remove old cache files.

Read more http://drupal.org/node/715450 or check a hack http://drupal.org/node/803458

Retro mode

mikeytown2's picture

Look for it in the advanced section.

Nginx

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week