Authenticated User Caching Thoughts From Another CMS

Events happening in the community are now at Drupal community events on www.drupal.org.
kwinters's picture

I just wanted to share some technical information about a very different caching scheme, in hopes that someone would be able to make use of the information for Drupal.

First, a brief background: Coalmarch Productions (the company I currently work for) spent its development budget largely on an in-house CMS (CoalEngine, or CE) with performance, SEO, and flexibility as primary goals. This lasted for years with considerable success until Drupal caught up with us, and since CCK, views, and a number of other wonderful modules makes our custom development costs lower than our own system, we now use Drupal almost exclusively for new sites.

Rather than Drupal's node / taxonomy / block / menu system, CE organizes data into "Elements" which are basically nodes and "Pages" (a collection of elements organized into sections, plus a tree structure to determine the URL). Looking at the g.d.o submit discussion page right now, this could potentially be a "Page" with breadcrumb, search box, etc. elements in the Header section, groups blocks on the right in the sidebar section, and the OG form in the body section.

This setup lead naturally into a two-tier caching system: we store a cache for a Page, which is always the same for a given URL and lets us not look up the Element references on every request, and Element caches, which store pre-rendered html for a given element.

When a request is made, the flow looks like this:

  • Map URL to a page cache file
  • Build page cache and element loaders if they don't exist
  • Include page cache file, which outputs the document tag and some very high level structure for sections
  • Page cache includes each element's "Loader" file
  • The loader file checks to see if the needed element cache exists, and if not, loads the full bootstrap and renders it

When the page cache is created, it also creates element "Loader" files, which are basically mini bootstraps. An element can be set to cache on a location like $_SESSION['userid'], and if that element hasn't been seen by that user yet, it will have to create it. This way we know that the page cache doesn't have to do logic or need the bootstrap, and the element loaders will always exist if the page wants to use the element, so we don't have to do any conditional logic or even load most of the engine until we hit an element that is not cached. Element loaders are never deleted unless the cache location changes (via code change), and then the page caches are blown out as well so that they get remade.

The real strength comes from how elements determine where to cache. If the element always renders the same for everyone, use the default (single-cache location). If it is different per user, for example a login box that says "Welcome, Joe", then you add the user ID string to the cache path. If you want it to cache different per role, then you put the role in the cache path. We also use it for paging system by putting a GET or POST param (carefully) into the cache path, so ?page=1 and ?page=2 can both cache separately but still be shared between all users.

So, the advantages to this system:

  • Like block cache, you can render a chunk of html on one page and use the cache on many
  • Almost all content is cache-once anyway, and those situations are handled automatically and very fast
  • In few situations where we need something different per-user, we specify that it's the case and it both caches right and is about as fast as possible
  • Anon users all share the same cache, so they are very fast automatically without having to make a special case for them
  • Even requests that are not fully cached are usually 90% cached

This doesn't translate directly to Drupal because there is no real "page" level in the same sense, but I think that node and block caching can still benefit from a similar path mechanism. The block caching goes a long way, but there is still a need for modules like authcache.

I can provide a lot more detail if anyone is interested at all.

Comments

I think this is similar to

FiReaNGeL's picture

I think this is similar to an idea I have proposed a while ago. See http://drupal.org/node/300935 - someone implemented a two-phased rendering system in Django - http://www.holovaty.com/writing/django-two-phased-rendering/

I agree that this is totally needed for Drupal - alas I don't have the time / ability to code it myself. How should we organize this?

Cacheability Setting

mikeytown2's picture

Let each part/section of the page declare how cacheable it is, from a scale of 1-10; 0 being uncacheable 10 being fully static html. Better yet make some tests that figure out what is cacheable. I believe Drupal pushes the data out to the theme layer. If the theme layer pulled the data then implementing this would be easier; but would make a lot of the Drupal magic harder. As such this is an architectural issue. Authenticated User Page Caching (Authcache) tries to get around this by loading Drupal twice (html & ajax) for each request, each one being a minimal drupal bootstrap with the net effect of it being a lot faster.

The Django implementation

FiReaNGeL's picture

The Django implementation discuss the limitations of an Ajax-based two-phased rendering. Basically, it breaks if your user have javascript off (it's not that rare, I have NoScript for firefox and allow only the sites I want, so on first page view some sites are 'broken'), and secondly, its really ugly as it loads as parts of the page are coming in after the page has rendered.

No need for a 1-10 scale rating; lets find a way to identify parts that are dynamic, and don't cache those, cache all the rest!

Response to Ajax issues...

Jonah Ellison's picture

The non-JS issue is easy to get around--simply don't serve cached pages to users who have JavaScript disabled (which is usually less than 2% of visitors and won't effect overall site performance).

jQuery has the ability to use Ajax synchronously, meaning the browser won't display the page until the Ajax request has been made and a response received. This prevents the "ugly" page rendering. It's also possible to tell the browser to cache the Ajax results (using "max-age" header cache directive), so the only lag time is the first time the element is loaded.

I've seen both of those, and

kwinters's picture

I've seen both of those, and it does seem conceptually similar to the Django post but I am not familiar enough with the underlying Django structure to make any real judgments.

Really the biggest problem I'm having is getting a good idea of the big picture, particularly in D7. It's a moving target! It seems like there are a lot of new developments in caching, but I have no idea which ones will actually end up in use.

Moshe and many others are already elbow-deep in this process and much more familiar with core code, with a lot of exciting things like hook_nodeapi_post_load and other ways to cache things we couldn't before. So, I probably can't offer a whole lot in terms of patches, but I can offer experience from a different viewpoint. If authenticated users page views were as fast in Drupal as in our old system, I would be thrilled.

Ken Winters

www.coalmarch.com

Ken Winters

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: