Request for feedback on best practices in overcoming query performance problems in D6 and beyond

dwightaspinwall's picture

It is well-known that many very useful queries that you create (on your own or via the Views module) are costly to execute because MySQL creates a temporary table and does a filesort. A common example is {node}.type='xxx' and {node}.status='yyy' order by {node_comment_statistics}.comment_count desc.

Another common example is sorting posts in reverse chron order of most recent comment timestamp, which the Views module implements as order by GREATEST({node}.changed, {node_comment_statistics}.last_comment_timestamp) DESC

This problem is described at http://drupal.org/node/148849 as well as other places. The best (MySQL-specific) solution seems to be the creation of a compound index which includes the fields from the where and order by clauses. A generalized solution which takes this approach can be found at http://groups.drupal.org/node/17644

In my company's case, queries of the above form execute in 1-2 seconds on fast hardware running D6 with 70,000 nodes. Many pages on our site contain Views blocks running two or more of these pathological queries. Without optimization, page load time is unacceptable, as you can imagine.

Using the Tracker2 module for guidance, we have written a custom optimization module which creates and manages fields in {node}, exposes them to Views, and results in MySQL execution plans which are two orders of magnitude faster. Like the Tracker2 module, the module makes use of hook_nodeapi and hook_comment to ensure that these duplicate fields stay in sync with their masters in {node_comment_statistics}.

There are additional optimizations of this type that need to be made by us. So before going too far my questions are:

  1. What is the state of the Materialized Views API (MV)? Are there plans to create a backport of this for Drupal 6? If not, does it make sense to do so?

  2. If it does not make sense to backport MV, is there a D6-specific module (besides Tracker2, which is very specific to tracker functionality) which implements a reasonable set of optimizations in a general way (i.e. in a way many Drupal sites could benefit from)? If not, does it make sense to write one?