We all know that caching in its many forms can speed up your site tremendously, while also lowering your hardware requirements. Until now these caching solutions mostly involve caching a full page of data. However, progress is being made with selectively caching only portions of a page, giving the benefits of caching while varying the freshness of different portions of a given page, or allowing some data to be served in real time.
Most of us are familiar with using either Varnish, the Boost module, or nginx native caching to cache full pages of content which are then served to (usually anonymous) users without having your php interpreter boostrap your Drupal code. In the case of the Boost module, Drupal generates full pages as static files and then your web server serves these files directly. With a reverse proxy such as Varnish or nginx, your Drupal backend generates the page output and tags it with a cache control header that informs the frontend reverse proxy to serve that page content until the cache time expires, and then to check back with the backend.
One problem with these systems is that in most cases, the majority of the page doesn't change. Often it is only one small part of a page that changes.
Take for example a typical node page that contains headers, footers, various menus, node content, sidebars for navigation and related content titles and comments on the bottom. None of the headers, footers, or node content are going to change for hours, days or months. A list of related content will change as new node content is added, but that only happens once or twice a day. The only portion of content that will change often will be the comment section. Taken one step further, lets assume that people can also log into the website and that when they do, the top menu bar of the page will change to offer new menu items like a link to a personal user page.
One way to solve the problem of selective caching is with ajax. Your reverse proxy caches a full page, but the related content, comments and top menu are not contained in the page. Instead those sections are replaced with javascript that informs the client browser to make another request to the server to get that content and then the browser dynamically fills the content into the page. The http://drupal.org/project/ajaxify_regions module works this way. This is already how most advertisements are served, since the ads don't come from the page server anyway. This can work really well in the case of comments because services like Disqus and Facebook Comments also allow single signin and various social features. However, there are downsides to ajax, including the load that it places on the browser, the added latency time for trips back to the server, and the complications of javascript. It's often slow and jerky.
Another solution has been developed by a group of companies in the form of a specification called Edge Side Includes, or ESI. ESI allows a company like Akamai that has servers all over the world to cache full pages on the "edge" of the route to a client, but also refresh selective portions of the page from the origin server. The Akamai Content Delivery Network acts as a reverse proxy between the origin server and the client. The full page from the origin server is embedded with special ESI XML elements. After the edge server caches the full page, it then interprets the ESI elements and makes further requests to the origin server to fill in the needed portions of the page. The page assembly is performed on the edge server and then a completed page is served to the client.
This may seem inefficient since the edge server has to make multiple requests instead of a single request as in a typical full page model. However each ESI resource can be given its own cache time. In our example above, the related content block might only be refreshed once or twice a day, and therefore the edge server will use a cached copy of the content most of the time. Using a special cookie can also allow user or role specific content to be embedded as ESI resources. A top menu can change based on a user's role as contained in a session cookie.
Most of us don't have access to expensive Content Delivery Networks, but do have a great tool at our disposal in the nginx web server. The nginx web server contains it's own in-markup XML language that it calls Server Side Includes. Combined with its facility as a caching reverse proxy, nginx can give us many of the benefits of the ESI specification.
Two years ago a Drupal module appeared that offered to enable the benefits of ESI on Drupal: http://drupal.org/project/esi. The ESI module was designed to work with the Drupal block system to selectively serve ESI resources to Varnish and Akamai's CDN using the ESI specification. The Pantheon folks have also been interested in seeing ESI work with Varnish. (see http://groups.drupal.org/node/50093) And some work on the ESI module was conducted on github before Drupal.org switched to git: https://github.com/dstuart/Drupal-ESI-Module/network.
I experimented with this module in February, 2011 and made it work with nginx SSI, but didn't follow up with it for two reasons: 1. The Drupal block system isn't terribly flexible and putting things like comments into blocks requires custom code, and 2. Despite some interest and code contributions, the project wasn't headed towards a stable form. Last month that changed.
One of our favorite coders among the server optimization crowd, mikeytown2 of Boost, Advagg, and Expires fame, forked the ESI module and started developing: http://drupal.org/sandbox/mikeytown2/1328648.
He committed some Panels code and other things from the ESI issue queue and started crushing bugs. He added an option for the nginx SSI language to the module as well. It's no wonder that he's so popular here at the nginx group.
I've become a big fan of the Ctools/Page Manager/Panels system and I use the Panels Everywhere module to replace the Drupal block system entirely on my personal site. Ctools allows caching plugins for specific panels of content and is really ideal for this type of application. For this reason I haven't tested the ESI module fork with the block system at all, but I can offer the nginx configuration that I'm using with the ESI module fork and Panels.
I've written here before about nginx as a caching reverse proxy. This is what omega8cc calls Speed Booster in her terrific integration of Aegir with nginx. It requires Pressflow for Drupal 6 since Pressflow adds code to make Drupal friendly to full page caching by reverse proxies. On the performance page you enable External caching and set the cache time for your full page.
Mikeytown2's ESI module fork must be downloaded and enabled and SSI should be selected on its settings page.
I won't go into the full use of Panels here, but those who use it know how to create a page of panels. The settings for particular panes have a Caching setting and a new option will appear for ESI. After setting ESI, you will have options for the Panel Cache Scope and TTL. For our example with a related content block you would use the Global scope with a TTL of 12 hours. That means that nginx would cache the related content panel and refresh it twice a day. To serve live comments you would set the comments panel to ESI but set the Panel Cache Scope to Not Cached. That means that on each page request, nginx will request the comments live from the php backend.
Perusio and omega8cc have both contributed great full page caching configurations here on our group. For my purposes I'm using the following basic configuration:
location = /index.php {
include /etc/nginx/fastcgi_params;
fastcgi_param SCRIPT_FILENAME $basepath/drupal/index.php;
fastcgi_param HTTPS $php_https;
fastcgi_pass php;
set $nocache "";
## bypass cache for logged in users
if ($http_cookie ~ SESS) {
set $nocache "Y";
}
fastcgi_cache mycache;
fastcgi_cache_key $scheme$host$request_uri;
fastcgi_ignore_headers Expires;
fastcgi_cache_bypass $nocache;
fastcgi_no_cache $nocache;
add_header X-nginx-Cache $upstream_cache_status;
expires epoch;
}Experienced nginx users will recognize the limitations of this particular configuration, for example the cookie checking bypasses all caching for logged in users. I offer this config only to show that there's nothing special needed for ESI at this point.
My SSI snippet is only slightly different. Like many of us I use includes to add different configuration snippets to different sites. My includes/ssi snippet:
ssi on;
ssi_silent_errors on;
location ~ ^/(?<esi>esi/.*)$ {
internal;
include /etc/nginx/fastcgi_params;
fastcgi_param SCRIPT_FILENAME $basepath/drupal/index.php;
fastcgi_param QUERY_STRING q=$esi;
fastcgi_param HTTPS $php_https;
fastcgi_pass php;
set $nocache "";
if ($http_cookie ~ SESS) {
set $nocache "Y";
}
fastcgi_cache mycache;
fastcgi_cache_key $scheme$host$uri$args;
fastcgi_ignore_headers Expires;
fastcgi_cache_bypass $nocache;
fastcgi_no_cache $nocache;
add_header X-nginx-Cache $upstream_cache_status;
expires epoch;
}Since I insert this snippet through an include at the server level of my configuration, the ssi directives lie at the server level. The location is designed to capture any path that begins with /esi/. The internal directive makes sure that people cannot access the esi path directly. This is especially important if you allow your panel to cache restricted content or if you use the direct injection feature of the ESI module to put session data in the path.
Also note that while I use $request_uri in my index.php location I have to use $uri and $args in the ESI location. This is because $request_uri always contains the original page request. The $uri variable changes to the SSI subrequest. A quirk of nginx allows the Expires header to override the Cache-control header, so ignoring that header is important.
If you review the issue queue of mikeytown2's ESI fork you'll see several quirks that I've found and which mikeytown2 has either fixed or that we've worked around. Some of these have to do with Panels custom styles and access control. He's been adding these to the documentation thread: http://drupal.org/node/1369180.
You might be curious what kind of time and load savings you achieve with this type of configuration. While there's been discussion of adding an esi.php that only performs a light bootstrap, this has not had any code committed. For now, every ESI resource requested performs a full Drupal bootstrap, so you may wonder if only requesting the comments with each page load is even worth the effort. The devel module makes this easy to test. Just enable the page timer and you will not only get a page build time for the full page, but also for each ESI resource. You can compare a full page build with just requesting the comments panel. For me it is the difference between roughly 350ms to build an entire page and about 125ms to fetch only the comments panel. Since it takes nginx only a millisecond or two to fetch the full page from the cache and integrate the comment panel into the page, you're seeing a very large performance increase as well as lowering your hardware load significantly.
If your site has complex regions (e.g. views) that take significant time to build, but aren't refreshed often, the ESI module will not only allow you to cache them, but to easily control their refresh time, and without the php code of the views cache or the panels simple cache and without having to store them in the database or other store. To some extent, the nginx cache can replace a complicated memcache (redis, etc.) configuration with a simple file based storage solution that's integrated right into one of the lightest web servers and takes advantage of the virtual file system memory cache.
This module and configuration is very much a work in progress. I'm offering it so that the nginx folks here can work with it and with mikeytown2 to make this into a powerful solution for server optimization.
Comments
One of the best posts I've
One of the best posts I've seen in this group.
IMHO, this is the future of efficient caching, and at some point, this will become a standard caching technique on the Drupal ecosphere.
Unforunately I haven't had a
Unforunately I haven't had a chance to review Mike's code yet and merge it with the ESI module, but for sure this is great work and I hope to spend some time working with this soon. Thanks again Mike!
--
Marcus Deglos
Founder / Technical Architect @ Techito.
Good
this joins together two of my favorite things panels (I've been since the beginning a big fan of panels everywhere) and Nginx. I'm thinking that going out of drupal and perhaps using the Embedded Lua module is the way to go for co-opting doing a bootstrap altogether. No need for any additional PHP code.
The Embedded Lua module can do non-blocking I/O and use all Lua available libraries.
I agree 101% that taking full advantage of the Nginx cache is the way to go. No need to increase the complexity and bring in more dependencies. If you're using the Nginx cache this first step towards implementing busy locks
might/should interest you. I'm currently using it in production in all sites I maintain. No issues so far.It's good to see the cache
It's good to see the cache code getting some dev time.
For those interested in nginx
For those interested in nginx and ssi, there's an interesting project from chk at http://drupal.org/project/redis_ssi
ESI for regions
I've made significant progress. I created a ticket and will submit a quick and dirty patch. See http://drupal.org/node/1907054. Help refactoring / extending would be appreciated!
Article very interestingI
Article very interesting
I discovered Nginx and explore all possibilities to improve performance especially for authenticated users.
The ESI module looks very promising.
If I well understand ,the value of using ssi is to cache the pages to authenticated users (with the key $scheme$host$uri$args) among others, and serve with a php call the blocks customized (or panels) tagged with according to user logged on.
I looked at your code carefully to configure Nginx with your snippet and there is one thing I do not understand. It seems that with this configuration in your lcoation ssi, you continue to bypass the Nginx cache for authenticated users. Am I wrong ?
thank you again
You are correct. The posted
You are correct. The posted configuration bypasses the cache for authenticated users.
For demonstration purposes I was thinking of a common use case where you post a couple articles a day and people can comment, but most of your traffic is non-logged in readers. You want the readers to get live comments, but the rest of the site is only refreshed once or twice a day.
I mention another use case, where most of the site is static, but there is an individualized menu bar at the top. I don't have a sample configuration for that, but it would require removing the session cookie bypass.
Any complex site for logged in users, would require you to plan out your site carefully, the same way you would with the authcache module. And depending on how many of your blocks or panels needed to be live, you may or may not see a large performance increase. If your site is set up so that you have 8 live panels, then it is probably less efficient to make 8 requests and do 8 bootstraps than to just do it all at once.