Caching and 404s for aggregated js and css in D6

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
brianmercer's picture

It's a recurring problem even without local caching, but especially with it. You go to update a module and then you run database updates. But part of that is clearing caches and now Drupal regenerates your aggregated js and css, which makes sense in case the updated module has changed its js or css files.

And then the problems occur. If you have a reverse proxy in front of your web server, it is still serving old cached pages that refer to old aggregated files. But those files no longer exist because Drupal has regenerated them and deleted the old ones.

Even if you don't have a caching proxy in front of your site you might still see 404s in your logs for your js and css files. There's plenty of discussion on it: http://www.google.com/search?q=site%3Adrupal.org+404+aggregated+js+css

Here's my current solution using nginx caching. Please comment to improve it or offer a better solution.

nginx provides native caching. It should be simple to just cache the js and css files and serve them from the cache for some very long period until they're no longer needed. They're small, so even if we save them for 6 months they won't take up much disk space, even though you might end up with several hundred of them as you update your modules.

But it's not so simple. The problem is that nginx caching only works for proxying to backends and not for static files. It's made to proxy a page from your php-fpm or apache server. If you want to cache a static file, you have basically 3 choices:

  1. serve the static files from your backend like php5-fpm
  2. use a module like slowfs cache
  3. create a new server in nginx and have it proxy to itself

None are very elegant. The first is the simplest, and since you're only talking about a couple dozen requests, it's not very taxing on your backend, but it does feel like a waste of CPU cycles. The second is made to order since the module was designed for caching static files that might exist on another system mounted through NFS, but you have to recompile nginx with a 3rd party module which might or might not work with the next version of nginx. The third option requires no backend interaction and no third party modules, so I went with that.

First I created a new listening server on an arbitrary localhost port:

## Serve static files for caching
server {
  listen 127.0.0.1:9090 default;
  root /var/www/$host/public;

  location / {return 403;}
  location ^~ /files/css/css_ {}
  location ^~ /files/js/js_ {}
}

Since it's a localhost port it's not accessible from outside. It has two locations that serve only aggregated css and js files. It uses the $host variable so it'll be good for multiple domains. I don't use Drupal's multisite structure that includes sites/[domain.com]/files/css/js_ so my files are in /files/css/.

Then I just need the caching section in my regular server:

  location ^~ /files/css/css_ {
    proxy_pass http://127.0.0.1:9090;
    proxy_set_header Host $http_host;
    proxy_cache_key $host$uri;
    proxy_cache  static;
    proxy_cache_valid 200 120d;
    expires max;
  }

  location ^~ /files/js/js_ {
    proxy_pass http://127.0.0.1:9090;
    proxy_set_header Host $http_host;
    proxy_cache_key $host$uri;
    proxy_cache static;
    proxy_cache_valid 200 180d;
    expires max;
  }

Because I'm using "priority" literal strings(^~) for efficiency, these sections will be matched before regex locations (~* or ~) so it doesn't matter where in the server section they occur. If you're using /sites/[domain.com]/files then you'll have to make them regex locations(~*) and place them prior to your regular css/js location. They pass the request to the local server and cache the file with a key that includes the domain, so we use one cache for all your domains. The cached file stays valid for 180 days and then purges.

Oh yeah, we need to define the cache in our http section like so:

  proxy_cache_path /var/cache/nginx/static levels=1:2 keys_zone=static:1m max_size=100m inactive=180d;

Since we set the cache to be valid for 180 days, it never checks for the actual file after the first retrieval, so it doesn't matter if Drupal has deleted it. After 180 days it would check the actual location, but hopefully by 6 months all your caches are up to date.

So what do you guys think? Am I crazy to go through this trouble just for a few 404s? Or should you just purge every page from your reverse proxy cache every time you have to update a module?

Comments

If I understand it correctly,

omega8cc's picture

If I understand it correctly, you are ended up with a chain of three (frontend proxy + internal nginx proxy + nginx) requests just to serve aggregated, static files. It doesn't sound efficient in any way.

It could be much simpler when you are using Boost to cache css/js files, and when you really need to use some extra proxy in front of nginx, why not to introduce exceptions in the frontend proxy so it will pass all requests to aggregated css/js files to the backend nginx with Boost enabled only for css/js files?

This problem is not limited

brianmercer's picture

1. This problem is not limited to the use of Boost or a reverse proxy. Even if you're just using nginx and php5-fpm, there are other proxies out on the internet and there's browser caching as well, so even with a plain Drupal config you might see 404s on these aggregated files in your server log. This is a simple and fast piece of protection.

And if you are using a reverse proxy, you may not want to install a large and complicated module like Boost just to address this small problem. Even the Boost dev suggested a separate module for this issue, although one that involves php code and a mysql lookup table. http://groups.drupal.org/node/53973

2. If you're using Boost for caching then you'd only have nginx and php5-fpm. In that case, I'd say that nginx is more lightweight and efficient for managing the aggregated static files than Boost php code and mysql access. You don't need Boost's complicated expire code for aggregated files, just a simple time based cache will do.

3. Also, development started this week on a new purge module that leverages the expiration code from Boost (in the expire module) and allows you to use nginx caching in place of Boost and lets Drupal purge the expired pages from the nginx cache. I'll post some sample configs later today or tomorrow for your feedback.

Thanks for the comments.

This (remote proxies etc)

omega8cc's picture

This (remote proxies etc) issue is not exactly "our" problem, at least so long, as we are using some good defaults for expire times in http headers. We can't do much for some remote proxy servers if they don't honor headers etc. Also, I don't think it is a good idea to use very long expiration times, as you suggest for cached aggregated css/js files. I even believe it is better if visitors will see "broken" design due to missing/renamed styles, as it is a signal that they should hit reload button, or maybe purge the cache in their browsers. Otherwise you may end up with something even worse than wrongly configured proxies results, serving outdated content and/or design for even longer time, as any such proxy will still add its own extra expire time to the already outdated page/styles.

As for using Boost - it doesn't really that matter how big it is, since it is responsible for only two actions: initial caching and purging the cache. And remember only initial caching is associated with anonymous visitor (one) page view, while purging the cache is a backend cron job, and serving the files is a pure Nginx, fast job in a one step. I would prefer to pay the price here and keep things simple, instead of adding more layers of caching (even if they are in the fast web server space), because otherwise you are multiplying requests for all requests to aggregated files that way, while you have not enough (imo) advanced control on when to purge the cache (and how), to keep the cache really useful and to not introduce new issues and confusion for webmasters spending hours and days guessing why their changes to themes are not visible etc. The cost you propose is, in my opinion, too high, comparing to the size of initial problem.

Of course, if we could use some tiny/fast and smart Drupal module, talking to nginx cache directly, that could be very interesting! I will watch this space for news!

Thanks!

About aggregated, never count

jcisio's picture

About aggregated, never count on expiration. When you want to change it, just add a dummy CSS/JS file to the theme and clear theme registry, you'll have new aggregate filenames. It's so easy.

Boost is really good (well, but I have to turn off db for the "old school" mode).

I'm not sure I see the

brianmercer's picture

I'm not sure I see the problem with long expire times on aggregated files. Those files are never updated with the same name. If they are changed they have a different filename, so there should never be a case where you need to refresh an aggregated file.

Hmm.. right, I was probably

omega8cc's picture

Hmm.. right, I was probably wrong in this case. This shouldn't be an issue.

What we are talking about

omega8cc's picture

What we are talking about here are workarounds to (bad) behavior in the Drupal core - the filenames for aggregated css/js files shouldn't change just because you purged caches and while it will be good to have a fast and smart workaround, maybe we could support also opinions like this one: http://drupal.org/node/721400#comment-2703462 to make the problem much smaller at source?

Yes, exactly. I know there

brianmercer's picture

Yes, exactly. I know there was talk about addressing these problems in D7, but I dunno what came out of them.

Other CMSes do things like a file named aggregated.css?a0s9df809s8d098s0d980f98df which lets you always serve the latest file since your webserver ignores the query, but when you post a new version, you change the query hash (or use a date/timestamp) and then the proxies and caches view it as new. Dunno why that approach was not taken.

Brian's solution

perusio's picture

Is a nice hack taking advantage of Nginx capabilities. This is obviously only something to be done on production sites. If you're developing the site/theme then turning on CSS and JS aggregation is setting yourself up for a bad experience.

I do believe that a module that talks directly with Nginx cache from Drupal is a direction worth exploring. Nginx can offer a much better solution than Varnish for caching, IMHO.

It's high time a suite of Nginx related modules sees the light of day.

Having similar issue and I

sugiggs's picture

Having similar issue and I probably will proxy to the webserver behind (apache,etc.) to serve css and js for the first time.

Solution