Improving Drupal's Search Speed Under Load Conditions

Events happening in the community are now at Drupal community events on www.drupal.org.
Anonymous's picture

Trellon recently released the Xapian search module for Drupal. This replaces Drupal's standard search feature seamlessly (except for a core patch) with an interface to the Xapian search engine. There's a post about it over on Trellon.com.

One of the purposes for doing this is to support high performance use cases. By segmenting search queries from normal database traffic, we hope to reduce load on the database overall for a category of data access that can be troublesome to cache. One of the points we talk about is power law distributions in search engines, where users tend to gravitate towards popular terms but there are lots and lots of smaller search terms that are rarely cached. This latter category of search requests tend to be expensive, frequently muck up database performance, and force site owners to think about alternative methods of providing search features for their sites.

While our benchmarks are preliminary, they do indiciate a performance advantage to be gained by using the Xapian search engine. On the tiny development server we used for benchmarking, we saw an average of a 42% performance increase on the actual queries used to get data and a significant increase in non-cached page generation times (the results there vary, and we are very interested in hearing other people's thoughts).

In terms of implementation, the xapian search module does require a core patch currently, but we hope to get by that in D7 with a search.inc file similar to cache.inc. It can be run from any server on a network and can be used to index multiple sites simultaneously. It is a little anal and does return more search results than Drupal's search module in general, and has logic operators for doing advanced searches.

Overall, we expect the impact to be increased database performance for sites using the module, increased ability to index collateral content such as PDFs and Word documents, and the ability to provide search interfaces which bypass the Drupal bootstrapping process altogether. We have included a number of benchmarks in the blog post and would appreciate any testing / thoughts / brutish complaints anyone would care to share.

M

Comments

Cross-post?

David Lesieur's picture

I suggest that you cross-post this report to the Search group as well. ;)

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: