Towards a Generalized Drupal Object Caching Mechanism

Events happening in the community are now at Drupal community events on www.drupal.org.
joshk's picture

In my never-ending quest for greater Drupal Glory, I've been spending the past year boning up on the various ways to improve site performance and address issues of scalability. Today, doing some noodling with Amazon EC2 instances (and remaining unconvinced about their raw performance as potential master database servers) I had a thought:

What would it take to extend the static node cache in node_load() beyond the individual drupal bootstrap?

Like just about everyone else, I've been loving how much memcached helps speed site performance. It simply rocks, and everyone looking to reduce server load and speed page responses should be looking into it. One of the better things about it is that it can store and return data objects natively, meaning not only are you letting PHP pull something out of a lightning-fast memory cloud, you also don't incur the CPU overhead of having to unserialize() a string into an object or array.

This let me to my thought. If you want a massively scalable interactive drupal site, you need ready access to tons of nodes. Inevitably you will hit the wall with logged-in requests for these from your database. But what if we were able to take the performance boost we get from node_load()'s static cache, and make it work persistently across an entire site, rather than just for one pageload?

This presents an attractive roadmap for high-performance drupal architecture: using the database as the persistent data store, with tables optimized around high-performance searching/sorting using the methods outlined in David Strauss's excellent DNA module, and maintaining a memcached-based cloud of fully-built node objects on top of that.

Outlining the code here is pretty simple. A module with a low weight would implement hook_nodeapi() to handle clearing the node cache on insert and update, and filling it on load. Then a simple wrapper around the default node_load() would check said cache first before going to the database.

In a perfect world, this would be baked into core, and such a thing might be possible for 7.x. However, for experienced Drupal developers, the kind of patching needed to implement this under 6.x would be even simpler than the serialization diffs associated w/memcache under 5.x. It could literally be a couple lines after the static check in node_load().

Should such a project emerge and mature, it could develop into an overall object-cache for all the core drupal elements (e.g. users, taxonomy terms, etc). Along with the new schema architeture (which lays the groundwork for application segmentation/sharding) and testing framework, a high-performance object-cache would position Drupal 7 to take the enterprise by storm.

Any responses?

Comments

Node caching in advcache module

robertdouglass's picture

Hey Josh,

are you talking about the type of node caching that is done with the patch from the advcache module? Or something different? Taxonomy terms/vocabularies, comments, and a few other things are also cached with advcache. Mike O'Connor just posted a patch to port it to D6 as well. All of the stuff that gets cached uses the standard cache API, so it can all go straight into memcache (either with the memcache module or cacherouter).

http://drupal.org/project/advcache
http://drupal.org/node/242121#comment-1146201

Indeed!

joshk's picture

Hey Robert,

Yes! Well, I guess it should come as no surprise that this work is already underway in the community. :)

So rather than wasting time on my own module here, I will start working on a D6 version of advcache. Expect patches.

Beyond that, although I'm clearly late to the party, I'd really like to hear peoples ideas/responses to this kind of concept. I've been spurred in this direction lately by a few projects and also by preparing for my presentation on handling asynchronous data scalably/securely.

Getting more best-practices out there on these topics would benefit the overall community I'm sure.

http://www.chapterthree.com | http://www.outlandishjosh.com

Great! Are you a D7 champion?

robertdouglass's picture

The patches that are currently in the D6 port from Mike O'Connor have had some fair amount of vetting in D5. Now would be the time to start championing for them to get into D7. It's a one-at-a-time proposition, and requires all the patience of core development, plus benchmarking. maybe you can step in and help champion the patches for D7 inclusion?

Your ideas are right on

robertdouglass's picture

but caching things takes hard work. You have to be able to prove two things: 1) that it is faster, and 2) that it doesn't break anything by serving stale or broken objects. Therefore, given today's standards for core coding, any core caching patches have to come with unit tests and benchmarks.

Point well taken

joshk's picture

That's very true.

I can't promise anything, but getting a general object-level cache into core would a major boon. Perhaps the history of block-caching shows the way...

http://www.chapterthree.com | http://www.outlandishjosh.com

And the toughest node to cache...

Scott Reynolds's picture

...is a poll node. So any code you write for this needs to address polls. I don't know anymore if this is handled in Adv Cache or not. I don't remember the tricks that were involved with it anymore. I know that on a couple of the sites I have deployed with Memcached nodes, I don't cache the polls for auth users.

Yup, got it covered.

robertdouglass's picture

There is a no-cache option that you can set for any content type, and poll node is included (or rather excluded) by default. See here: http://tinyurl.com/bm25c8

folks - catch is way far

moshe weitzman's picture

folks - catch is way far along on this. see d7 patch at http://drupal.org/node/111127. is very close. reviews wanted.

shmop?

jason.fisher's picture

I have seen very little discussion on using shmop (http://us3.php.net/manual/en/ref.shmop.php) with Drupal.

It seems that it would be an ideal candidate for exposing caching hooks and storing high-level $node objects persistently between sessions/users?

Object Caching in DRUPAL-7: Core Issues To Note

joshk's picture

Anyone interested in object-level caching for Drupal 7 should read up on these core issues:

http://drupal.org/node/111127

and

http://drupal.org/node/439186

There's movement right now with a number of core contributors, and if you have ideas, opinions (or patches!), now's the time to chime in.

http://www.chapterthree.com | http://www.outlandishjosh.com

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: