Varnish w/nginx for Static Content

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
waverate's picture

(Referred from http://drupal.org/node/1255892)

I have been working on two development servers to try out a concept:

Server 1.
varnish -> apache > drupal, and

Server 2
nginx -> apache > drupal, with
nginx -> (jpg, txt).

I have read all good things about running nginx with no modules to server Static Content (images, text files, etc) and it seems to work very well on server 2.

Both development servers are working very well. What I would now like to try is to combine the benefits of both but I do not know which is the preferred setup?

Setup 1.
varnish -> apache -> drupal, and
varnish -> nginx -> (jpg, txt); or

Setup 2.
nginx -> varnish -> apache -> drupal, and
nginx -> (jpg, txt).

Does anyone have experience with this? Are there examples/configs?

Comments

Why does Apache have to be in

Fidelix's picture

Why does Apache have to be in the picture?
It is slower than nginx for dynamic content too, unless you use mod_php, which sucks and eats huge leaps of memory.

I'd use Varnish -> Nginx -> Drupal/static. Nginx can act as a reverse proxy too, but for now it's not so mature, so Varnish is still needed.

Why use Apache

matthewv789's picture

The main reason mod_php uses up huge amounts of memory is that, with KeepAlive enabled, the threads created for PHP (and expanded to whatever size needed to process that php) stick around and are subsequently re-used - still taking up that same amount of memory - to serve up CSS files, images, etc. which could be served by much smaller threads. So turn off KeepAlive in Apache, and those threads will die as soon as they are done delivering the page, freeing up their memory quickly. Also since Apache would only be used for serving PHP in the first place, the number of threads that get created will be much lower on most sites. (KeepAlive will still be on for nginx, so you still get the benefit.)

And it does turn out that Apache+mod_php is up to 25% faster than nginx+fcgi, as well as possibly still being more stable. Still, it may be worth benchmarking both configurations on your actual site with representative traffic to see the differences in performance and CPU and memory usage (also taking into account how cheap/easy it is to expand each).

As for Varnish -> Nginx, I wouldn't bother. Every benchmark I've seen shows nginx as being, if anything, FASTER than Varnish. I'm not saying to avoid Varnish, but adding it in front of nginx probably won't buy you anything performance-wise.

While nginx is actually a very good and mature reverse proxy and caching proxy (possibly used more widely for the latter than Varnish, outside the Drupal world at least), I would tend to use nginx more as a server than a proxy, so you don't have to worry about expiring cached files - let Drupal and Boost manage the expiration of cached HTML and aggregated CSS/JS files directly. (Images wouldn't even be an issue since it would be serving the source files directly, not a cache of them.)

So, are you saying that

Fidelix's picture

So, are you saying that apache eats less RAM than nginx with keepalive disabled? Don't kid me ^^

And it does turn out that Apache+mod_php is up to 25% faster than nginx+fcgi, as well as possibly still being more stable

You don't have any data to back that up. And you won't have until Apache is considerably refactored.

Apache for familiarity

waverate's picture

Thank you Fidelix.

I am running Apache with php-fpm. I tried using nginx with php-fpm on Server 2 for the dynamic content but I didn't really notice any difference in performance. It made me consider that perhaps the effort in having to learn nginx configuration files was not worth the effort; or perhaps I didn't configure it correctly.

However right now, the httpd processes are using 40-90 MB of memory. I would hope to be able to use the slim nginx process to server static content like I did on Server 2.

The one setup that I have not been able to figure out is how to run nginx for dynamic and static content. I would have thought those were two different configuration of nginx: one with additional modules and one without.

Can you elaborate on your nginx configuration for serving dynamic and static content? Is it more than the nginx wiki (http://wiki.nginx.org/Drupal)?

nginx - 2 use cases

matthewv789's picture

To run nginx as an Apache replacement, it needs to be running fast cgi so it can handle PHP. Otherwise its configuration should essentially match a regular Apache configuration (translated into nginx syntax).

To run nginx as a server for static files in front of Apache, it needs to check for images etc. that it can load, and also check for other URLs in the Boost cache directory, then otherwise (if it's .php or 404, or a POST request, or with a Drupal cookie, or https...) pass the request through to Apache, which will be running on a different port. (It also needs to block loading of .htaccess, xyz.inc, and other "dangerous" file types you wouldn't want to load directly as a file, ie, where the source contents should be private.) In this case, you can then turn off KeepAlive for Apache. Between threads dying quickly and fewer being created in the first place, Apache's memory usage should also be much lower in this scenario.

nginx + fast CGI to run php is, if anything, a bit SLOWER than Apache + mod_php. (I'm not sure how memory usage compares in that case.) But nginx is 2-10x faster than Apache for serving static files, and uses less CPU (maybe ~1/2) and far less memory in the process. (Memory usage is also remarkably stable, unlike any other web server.) So the benefit comes from using nginx for static files, not when running php. This can always help when it comes to images etc.; how much it helps for your HTML pages depends on whether they are mostly anonymous or mostly authenticated.

So the benefit comes from

Fidelix's picture

So the benefit comes from using nginx for static files, not when running php

That's not entirely true.
You WILL get benefits from running nginx with PHP as well. The simple fact that it uses CPU and memory much more efficiently brings huge benefits for busy sites.
Removing ANOTHER webserver from the stack (apache, in that case) also brings a small benefit from a speed and also a maintainability point of view.

Apache + mod_php

matthewv789's picture

Apache + mod_php is still the fastest way to run php. Its high memory usage is mainly due to its tight coupling and the use of keepalive - using the same memory allocation for php and non php + which is not an issue if it's not being used for static files and if keepalive is disabled.

Can you elaborate, please.

perusio's picture

It should be slightly faster than using fpm. Of course there's a huge can of worms lurking. But I was under the impression that the reason it uses huges amounts of memory it's mostly an architectural issue. prefork or worker, both impose an heavy price on connection acceptance.

php

matthewv789's picture

PHP processing requires about the same memory whether fcgi or mod_php.

The problem with mod_php is a thread expands to the size needed for php (up to max limit), which is fine, but then stays open and is reused for other things at its still-bloated size. That's where the waste occurs - threads pile up that are a lot larger than they need to be. With fcgi the php is separated from the web server thread(s) so this doesn't happen, though memory leaks are not unheard of.

Other inefficencies of httpd are canceled out by the similar overhead of loading and communicating with php via fcgi. So when running php httpd's disadvantages disappear.

By using httpd ONLY for php and disabling keepalive, memory usage should be similar to other servers with fcgi, and speed for php pages using mod_php is up to 25% faster than even nginx + fcgi. Many people are surprised by this. :)

Hmm

perusio's picture

Even if it's true that having the language embedded in the server is faster (let's leave security issues aside for the moment) you're forgetting that since you seem to suggest not using Apache for static content, you'll be setting it up as an upstream. Hence there's the overhead of the communication between the reverse proxy and the upstream. So if on one hand the communication between a web server and the FCGI listener is slower than having the language embedded, on the other there's the chatting of the reverse proxy with the upstream over HTTP.

Furthermore the MPM is a much heavier process resource wise than the master process of fpm.

I would like to know in which conditions you derived that number: "25% faster".

httpd mod_php

matthewv789's picture

The ~25% faster is from recent benchmarks presented at BADCamp by a Drupal core developer and performance expert whom I won't name, as the numbers are from memory and he doesn't have the slides online yet for verification. (He, too, was focusing on nginx/fcgi, not nginx -> httpd/mod_php, and wasn't recommending the latter configuration, but did have a slide comparing their performance.) These are of actual Drupal pages.

Dries several years ago benchmarked Drupal using lightppd/fcgi, and it too was about the same as or a little slower than httpd/mod_php (but these results are pretty dated): http://buytaert.net/drupal-webserver-configurations-compared, the direct comparison is on the last slide. Yes, you can argue there are security implications to running php as the httpd user, though it's not always clear which is more "secure", depending on your server setup, ie, shared host vs dedicated, or how exactly fcgi permissions are configured. On the other hand, I've seen reports of either instability or lack of support for certain features or php code with fcgi. Some people also prefer to use existing httpd.conf or .htaccess rules instead of needing to translate everything to nginx configuration. So there are other pros and cons besides just performance. )

http://blog.a2o.si/2009/06/24/apache-mod_php-compared-to-nginx-php-fpm/ and http://blog.litespeedtech.com/2010/01/06/benchmark-comparison-on-serving... show httpd/mod_php being about 1.5-2.5x as fast as nginx/fcgi/fpm (an even bigger difference than I was expecting, though these are very simple php "hello world" pages and not representative of Drupal performance).

The one odd result is whether to use KeepAlive or not - these tests suggest leaving it on performs and scales considerably better for httpd (but not much for other servers). However, these are atypical synthetic tests - a single user making hundreds of simultaneous requests for the same php page, meaning it can re-use the same connection. (For the others, the fcgi connection is probably not persistent even if the http connection is.) This doesn't match the real world, where you would have many DIFFERENT users with DIFFERENT connections (no single user is going to request hundreds of php pages per second unless for a DOS attack...). So the KeepAlive results might not be valid for real-world use cases. On the other hand, what if Apache sees nginx as a single, persistent connection, rather than seeing individual connections to individual end clients?

Also, the non-keepalive scalability for Apache, where it hits the wall at the highest concurrency in one test, could be the result of particular server settings. Slightly different settings could move the "wall" out farther, and the other servers might hit the same wall at a slightly different point perhaps just beyond where this test ended, again, depending on their particular settings. So that one point is probably not a reliable result with much meaning without a lot more information and context - ie, testing at still-higher levels of concurrency and with various settings.

In the real world with more complex php content, this advantage would be much less than 2x - as with the Drupal-specific benchmarks - since the creating of threads, loading of php etc. would be a much smaller part of the total processing time.

These sorts of results have been consistent whenever I've seen people actually run the benchmarks comparing PHP performance of httpd/mod_php vs [someotherfastserver]/fcgi. They either come out about the same or with a sometimes significant performance advantage for mod_php. All too often, though, people just test performance on static files and assume the results (2-10x performance improvement over httpd in requests/sec with much lower CPU and memory usage) will carry over to php performance. These days fcgi combined with a fast web server is great, and it's a perfectly good way to run a web server - maybe the best overall, depending on your priorities. But it's not faster at running PHP than httpd with mod_php.

As for the "chattiness" between the web servers, it would have some effect, but I think it's a pretty simple handoff with not a lot of "chatting" required. If nginx fails to find a file to load (which it should know very quickly), it just sends the request on to the port httpd is running on (which shouldn't take long), which proceeds as normal, then nginx forwards the result to the browser (which also shouldn't take long). The total performance is probably still a little better than handing the request off to fcgi, but only some benchmarks would tell for sure.

In the end, I think both approaches are very good options, with some pros and cons, and it's not 100% clear yet if there's a strong advantage for either. I feel like combining the two uses the best tool for each job; others prefer the single-server approach for the simplicity of fewer moving parts or for other reasons.

The one thing we can all agree on, though, is that relying on httpd/mod_php to serve up static files is going to be slow, resource-intensive, and not very scalable, compared to either a faster web server or a caching proxy.

Apparently you didn't read

Fidelix's picture

Apparently you didn't read anything I wrote here:
http://groups.drupal.org/node/175644#comment-618604

I'm not even considering security or other ridiculous problems you run into when using mod_php.
This whole comment of yours is entirely invalidated by the "real world application performance comparison results" on the first link you provided.

In the end, it doesn't matter if mod_php is faster for helloworld.php if it fails MISERABLY with full applications. Nginx keeps the memory/CPU usage steady on helloworld or drupal+100 modules. It performs much better for busy sites or complex applications. All benchmarks point to that.

Again, we are talking about mod_php vs php-fpm for running applications (drupal), not helloworld.php

That first set of benchmarks

brianmercer's picture

That first set of benchmarks you linked show nginx+php-fpm beating apache+mod_php on complex php applications such as Drupal, though only slightly.

This concurs with the consensus among the Drupal+nginx users that php performance is comparable to Apache+mod_php, but that nginx excels in static file serving, caching, ssl handling, gzip handling, configuration and scripting, memory use, etc. Some of the memory use benefits can be obtained by using Apache worker, but then you lose whatever benefits you might have from in-process php.

hey, aren't we forgetting PHP

Fidelix's picture

hey, aren't we forgetting PHP unix sockets?

It avoids the TCP overhead even for loopback that still needs to encapsulate / decapsulate traffic, or fragment the information which is not uncommon for real world applications as the loopback MTU is generally just 16436.

Check out this comparison:
http://blog.a2o.si/2009/06/24/apache-mod_php-compared-to-nginx-php-fpm/

mod_php beats fcgi for helloworld.php WITH keepalive. Without it, it downscales considerably.

Now check "Real world application performance comparison results".
php-fpm beats mod_php.

You still need to take into consideration the fact that for those benchmarks PHP-FPM was running with a process limit of 16 for the current pool.

Read this comment, which states that php-fpm had unfair disavantages on this test, and the author agreed:
http://blog.a2o.si/2009/06/24/apache-mod_php-compared-to-nginx-php-fpm/c...

waverate, you won't notice

Fidelix's picture

waverate, you won't notice speed differences from apache+php-fpm and nginx+php-fpm without running some heavy benchmarkings. You'll only see real results (specially regarding requests/second and memory) under heavy load.

Please, take some minutes with the nginx config syntax. It's JUST english, and much, much, much easier and cleaner than Apache. Seriously.

I'm just saying that you are keeping and additional, unnecessary (apache) layer for serving content, while nginx has full capability of serving dynamic content without the heavy memory footprint that apache is.

About configuration:
I'm using perusio's: https://github.com/perusio/drupal-with-nginx

You may run into some troubles because he's using configuration directives for some nginx modules that aren't present on standard distro packages, so you can just remove those directives once you see the error.

Both modules

perusio's picture

are part of Nginx core:

  1. FastCGI for proxying to PHP's FastCGI process (be it php-fpm or any other FCGI process manager).

  2. Core for serving static content (the default content handler).

There's nothing to compile in explicitly. I use other modules. Namely Upload progress to provide the upload progress bar. But you can comment out that part or use my deb package of the latest version.

There are a few things that I

jcisio's picture

There are a few things that I think I should notice:

  • Mod_php is as secured as fcgi (if you have a dedicated, or at least a VPS).

  • Any benchmark about serving php with a concurrency > 5 is very less useful. Suppose one server can serve about one million page views per day, in which 90% are anonymous visitors, in which 90% pageviews are cached (by boost or any reverse proxy). So your web server has to serve only 210,000 page views per day, or 2.4 pages per second, or about 5 pages per second in peak time. YMMV, but 1,000,000 page views per day per server is quite high.

  • Both nginx and Varnish are good for static content. If you already have Varnish, I don't see why you need nginx for static content, and vice versa. In most of the time, the reverse proxy serves static content directly from memory.

I've been using nginx for static content and as a reverse proxy to Apache for php for 2 years. It's good enough, and it save my time to test an all-nginx configuration. But in the long term, I'd switch to either Varnish/Apache (more likely) or all-nginx.

Varnish is more flexible with its multi-stage process (nginx only has one stage), so that you can modify a request before send it to/after receive it from the backend. Nginx is simpler, but it is not as popular/stable as Apache in serving PHP.

Nginx is simpler, but it is

Fidelix's picture

Nginx is simpler, but it is not as popular/stable as Apache in serving PHP.

Do you have anything to back that up?
Nginx doesn't serve any PHP, it proxies stuff to CGI.

Now you would be comparing FastCGI vs mod_php about which one is more stable.
Due to it's sandbox-per-request nature, FastCGI is naturally more stable, and this is proven by all big projects out there using it instead of mod_php.

Mod_php is as secured as fcgi (if you have a dedicated, or at least a VPS).

I disagree. If you run multiple projects on this dedicated server, one exploit in one of these projects can compromise all of your projects. How can you possibly say that mod_php is as secured as fcgi? I don't understand...

Any benchmark about serving php with a concurrency > 5 is very less useful. Suppose one server can serve about one million page views per day, in which 90% are anonymous visitors, in which 90% pageviews are cached (by boost or any reverse proxy). So your web server has to serve only 210,000 page views per day, or 2.4 pages per second, or about 5 pages per second in peak time. YMMV, but 1,000,000 page views per day per server is quite high.

I disagree again.
There are sites that can't take advantage of anonymous caching. And since using boost the cached page will be served by the webserver, nginx would be much faster as that's static content.

I honestly see no reason for someone to use Apache in 2012 other than legacy support for some obscure modules like svn or webdav (which are also reaching a stable status in nginx), or some web control panels like Plesk or something else (cpanel already supports nginx, I think).

I am pretty sure that Varnish + Nginx is faster than Varnish + Apache. You just have to put the pieces of benchmark together. I see no space for Apache here...

@Fidelix:this is proven by

jcisio's picture

@Fidelix:

this is proven by all big projects out there using it instead of mod_php

Any info to back that up, as I don't see exactly the same thing? I only know that examiner uses nginx. Most of other largest sites (Drupal sites like symantec.com or popsci.com, the biggest PHP forums using phpBB/vBulletin) all use Apache. Read this to know that mod_php is faster and more stable than mod_fcgi or FastCGI http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better... (some information is no longer correct, but the overall still stands)

If you run multiple projects on this dedicated server

Never do that, point. Except that it is your dev server, or you don't care about security/performance. Each project should be on an isolated instance. If not, any compromised website can use up your APC memory or MySQL database space and takes the whole server with all your projects down. Put each of your projects in an OpenVZ instance.

There are sites that can't take advantage of anonymous caching. And since using boost the cached page will be served by the webserver, nginx would be much faster as that's static content.

Well, I'm saying in a high performance configuration, with a proper caching system/reverse proxy and that Apache serves only a minimal amount of static files. One million pageviews per day is a dozen of millions of hits per day, of course I didn't say that Apache will serve all of them, which could lead to very low performance in comparing to nginx. I don't deny it, on the contrary, a year and a half ago, I shared my nginx improvement vs Apache in the same group.

Again, mod_php is NOT more

Fidelix's picture

Again, mod_php is NOT more stable than FastCGI on nginx, and the 2bits post is NOT backing you up on this.
Nginx can only serve dynamic pages through it's implementation of CGI, how many effort do you think was put into that?

mod_fcgi might be unstable, but run nginx with php-fpm and thats the stable it can get. And PHP-fpm will be in PHP 6 core, if you don't know.

From 2bits:

Conclusion
If pure speed is what you are after, then stay with mod_php.

However, for better resource usage and efficiency, consider moving to fcgid.

So, if you don't want to trash your resources, use FastCGI. It's basically that. It will naturally scale much better than mod_php, as the a2o.si benchmark proves.
If you are running a small project, you want to SAVE resources, so you shouldn't use Apache or mod_php.
For big projects, you want to scale, so you don't want to use Apache or mod_php.

Never do that, point. Except that it is your dev server, or you don't care about security/performance.

I disagree. There are legitimate use cases for that.

I am pretty sure that Varnish

cweagans's picture

I am pretty sure that Varnish + Nginx is faster than Varnish + Apache. You just have to put the pieces of benchmark together. I see no space for Apache here...

I'd like to throw in that you don't need Varnish if you're using Nginx. Use the built in Nginx page cache + memcache to store it. I'll be posting a new version of http://cweagans.net/blog/2011/10/26/drupal-hosting-adventure to include that sometime soon.

--
Cameron Eagans
http://cweagans.net

I'd like to throw in that you

Fidelix's picture

I'd like to throw in that you don't need Varnish if you're using Nginx. Use the built in Nginx page cache + memcache to store it.

Actually, I tried something similar with Barracuda.
Nginx's page cache implementation is in contrib (AFAIK), and unfortunately not so stable yet(from my not so extensive tests). And there's nothing as powerful as VLC on nginx yet, though I remeber ther was a guy implementing LUA programming in nginx config files, though.

If these stuff got into nginx as standard modules, then maybe it could be an equal alternative to Varnish.

@Fidelix

perusio's picture

Nginx's page cache implementation is in contrib (AFAIK), and unfortunately not so stable yet(from my not so extensive tests.

Don't know what you'te talking about. In fact it's been sometime since Nginx cache is part of Nginx core. There was a contrib project ncache that got merged and is the basis of the current Nginx cache.

And there's nothing as powerful as VLC on nginx yet, though I remeber ther was a guy implementing LUA programming in nginx config files, though.

VCL is one of the ugliest things I've ever seen. Beating FORTRAN77 and Perl. I don't see why you need it. You can do pretty much anything with the much cleaner and sane Nginx config language. If you wan't to do more kinky stuff then you can use agentzh's Nginx Embedded Lua. I've never seen the need to use it.

Also as a note. Nginx caching follows a different approach than Varnish caching. You don't care about the overly complicated expiration logic. You use a small TTL and tweak the cache loader if needed. You just need to keep the cache warm and for that you use a crawler.

I'm working right now on the much awaited Nginx cache module. And it's just a crawler that is aware of the patterns of content creation/updating on your site.

I think that having to deal with stuff like the Expire module and such is the wrong way to do it. It presumes that it's expensive to cache a page. It isn't. What's expensive is dealing with the expiration logic. But hey, if caching a page is cheap, why bother with it.

As for @jcisio. Nginx is not "threaded" like you wrote. It's event loop based. There are some plans to implement threading to deal with blocking I/0. But at the heart it's just a for(;;) that accepts connections inside taking advantage of the OS provided facilities. AFAIK Apache doesn't care about epoll or anything like that.

Tha fact that Apache + mod_php has still such a stronghold in Drupal is, IMHO, quite sad. WP is much forward looking on that respect. For example, wordpress.com runs solely on Nginx + php-fpm. They dropped litespeed this year, that they used previously for handling the PHP serving.

I'm sorry, perusio, I believe

Fidelix's picture

I'm sorry, perusio, I believe I was referring to the cache purge nginx module: http://labs.frickle.com/nginx_ngx_cache_purge/

Can nginx strip cookies or do stuff like user-agent specific bins?
If yes, then I was unaware of that, and I will certainly study this. Thank you for the information.

There's no need

perusio's picture

to use that module, the purge 3rd party module, that is. In Nginx the cache is just a set of files in a binary format. You can check to see if a particular URI is cached by using:

grep -Ri <URI> /path/to/cache/zone

Then a simple rm -f <cache file> will remove the file from the cache without any issue.

Yes it's not the most efficient way, but it works and is quite simple.

For stripping cookies there are several possible approaches. Yes it's not very complicated to have different cache zones for different UAs.
A quick example:

map $http_user_agent $use_cache_zone {
  ~UAone  zone1;
  ~UAtwo  zone2;
}

location = /index.php {
  error_page 418 @zone2;
  if ($use_cache_zone = zone2) {
      return 418;
  } 
  ...
  include zone1_cache.conf;
}

location @zone2 {
   ...
   include zone2_cache.conf;
}

This is beautiful. I guess

Fidelix's picture

This is beautiful.
I guess all that's needed is a way to expire pages directly from Drupal (IE, the "nginx cache" module).

If there is anything I can do to help you in getting it done, let me know.
I'm good with PHP.

Not quite

perusio's picture

I think that the all expiration logic thing is way too messy and complex. Instead you should follow a different approach. Cache everything with a small TTL and then use a crawler to keep the cache warm. There was recently an article on the Habrahabr Nginx Blog about it. It uses the UNIX utilities to get a list of the most visited pages and then feed the URI list to wget for crawling and keeping the cache warm.

My module is along those lines. Yes you could implement something like Drupal 7 Cache Interface. But the all point is to route around that. Keep it simple.

Thinking about it... as long

Fidelix's picture

Thinking about it... as long as you keep a relatively decent TTL and don't have that many pages, that could actually be a good approach.
But what about sites with 30.000 nodes?

UNIX utilities to get a list of the most visited pages and then feed the URI list to wget for crawling

This logic efficiency would be CRUCIAL to keep the load low.

It's hard to understand the process he is doing because the post is not that well detailed, and the script is a one-line script ^^
But I assimilated the basic logic, and I think it's nice, and could be very light.

How are you planning to "prioritize" some urls over others on your module?

Right now

perusio's picture

this module is to be used on a newspaper site. There the most visited pages are the most recent. Hence it's quite simple. As for having a way to determine which pages are most visited. There's the Nginx log, of course. If you use any statistics module (I know that's now exactly kosher on a site with high traffic) you could get it from there. If you use google analytics there's an API (with a module — needs further research) to get traffic data. So there are several possible approaches.

Acessing the Nginx log requires root access or adding a user to the adm group which is not very wise. The best option is to use something like super or any other setuid wrapper to grep/sed/(M)AWK the log file.

Also the module part is just a form that implements the config. The real action happens in a drush command that is activated by a cronjob. For the newspaper, they have a new article every two minutes at least. So the TTL is 180s and the cache warmer will crawl the most recent 100 pages every 4 minutes and it increments the page count on each run. So that at the end of the day is crawling ~ 1000 pages. At night the load diminishes and you can reset the process and start all over again. This is synchronized with the usual schedule for news consuption. Starting at 07h00 and going up to 00h00.

I don't think that one size fits all approach that dealing with expiration logic promotes is adequate for a site like a news site. So the idea is: tweak the cache to your usage patterns instead of blindingly accepting an expiration logic magic pill.

Sounds very valid, for a

Fidelix's picture

Sounds very valid, for a general newspaper site.
However, this wouldn't solve most other use cases, where users are most interested in the inside content (coming from google or browsing naturally for content)

This is seeming too edgy for me, if I'm allowed to say, for more general sites.
Unfortunately, I'll have to wait for a more tunable and dynamic, app-level cache expiration control for nginx or somehow do it myself, which unfortunately won't happen anytime soo.

What do you think of these?
http://drupal.org/project/purge
http://drupal.org/project/cache_actions
http://drupal.org/project/expire

Brian did that

perusio's picture

on a thread of the Nginx group. I've used that also. But I didn't like the way Nginx cache purge module works.

At the time it returned a 404 for each page purged which is just plain stupid to say the least. Hence Brian added the return 200 to get a decent status code. Also there's no wildcard support AFAIK. Hence you cannot implement the D7 cache Interface if you want it. You're better going on the grep -Ri direction than using the purge module, IMHO. Of course if you have a huge cache with millions of files is going to be slow going down that directory structure. It improves if you have the cache mounted as tmpfs. Other option is to use different cache zones (bins) and keep them small and shallow (two levels deep at most).

It does look like the Drupal

Fidelix's picture

It does look like the Drupal community is tied in some level with Varnish. If its focus were in Nginx...

Unfortunately if someone needs a reliable, quick and easy but still effective way to cache stuff in the moment, Varnish is the best choice. In front of nginx.
At least that's what I'm concluding right now.

Maybe we need more control on the Nginx end to better cache expiration control for that to change.

Why nginx and Varnish?

DevElCuy's picture

I have a VPS running nginx as reverse proxy in port 80 and apache in port 8080. And also nginx has cache enabled, so it uploads static content and files to memory and eventually saves cache to disk. So why to use Varnish and nginx?

--
[develCuy](http://steemit.com/@develcuy) on steemit

Really no reason you should

matthewv789's picture

Really no reason you should want to add Varnish in front of nginx. nginx is already at least as fast as Varnish (maybe faster), and already uses similar CPU and less memory.

Varnish is a perfectly good proxy cache, but the only reason I'd typically choose it over nginx would be to take advantage of edge-side includes.

There's not reason to use

Fidelix's picture

There's not reason to use apache either.
The good thing about varnish, is the amount of snippets and help you can find on the community.

Nginx's proxy part is not so popular, from what I feel. But I'd love if that changed...

There is this ESI nginx module that is "abandoned" since 2008:
https://github.com/taf2/nginx-esi

There is this "hack" too:
http://joshuajonah.ca/blog/2010/06/18/poor-mans-esi-nginx-ssis-and-django/

I'm not completely clear

brianmercer's picture

I'm not completely clear about the terminology here. nginx has a feature they call SSI which seems to do what people refer to as ESI, i.e. it lets you cache a page but insert a special call in the page that makes a request to the back end, and then integrates the back end response into the cached page.

In fact I used http://drupal.org/project/esi and made some small modifications that got nginx working in the way described. I filed a bug report about page specific visibility, http://drupal.org/node/1054954, but it worked.

The problem was that I use Panels for almost everything and there was only an experimental patch for Panels support, along with a general feeling that the module was not being actively developed.

I saw just this week that mikeytown2 has a fork and is interested in co-maintaining it. Here's hoping.

brianmercer, the feature

Fidelix's picture

brianmercer, the feature itself is there on nginx core. Technically, any server that have SSI is able to support ESI.
The problem is compatibility with the specification.

The ESI specification was mostly created by Akamai, so all caching webservers could standardize the work in the app layer.

As you said, you were able to get project/esi working with nginx for you. That's wonderful, and I'd like to hear more about that, if possible.

There is this D7 module that looks promising: http://drupal.org/project/esi_api
And it seems he doesn't like the project/esi module.

From the author:

Actual available D6 version of ESI module does not provide any kind of API, this why I wrote ESI API. Because D6 version is rather old and does pretty much nothing except exposing blocks via ESI tags, I can only try to guess what are the differences between what they are doing, so here it is, let's start with assumed similarities:

I made these changes to

brianmercer's picture

I made these changes to esi.theme.inc:

-  $src = $base_url . "/esi/block/{$bid}";
+  $src = "/esi/block/{$bid}";

-  $output = '<esi:include src="' . $src . '" />';
+  $output = '<!--# include virtual="' . $src . '" -->';

I haven't tried mikeytown2's fork with the Panels patch yet.

EDIT: I just looked at the fork and he's already got the nginx SSI format in there.

I tested that ESI fork in

brianmercer's picture

I tested that ESI fork in mikeytown2's sandbox. After some issues, it's working nicely with Panels and nginx. I did some quick and dirty ab testing of a page with 4 views on it vs a fully cached page where one of the views was fetched "live" by nginx and there was a substantial speedup.

That's wonderful news. Can

Fidelix's picture

That's wonderful news. Can you describe which issues you ran into?

I put them into the issue

brianmercer's picture

I put them into the issue queue and mikeytown2 fixed them the same day.