Searchlight vs. Apache Solr Views

Events happening in the community are now at Drupal community events on www.drupal.org.
nedjo's picture

Currently admins wanting to add Solr to their Drupal site have two options: Apache Solr Views and Searchlight.

Both can be used to construct views that filter content (including faceted search) on the basis of Solr indexes. Both projects - particularly Searchlight - are in active development.

In choosing between the two, a key question is going to be performance. For large sites in particular, a primary reason to build Solr-driven views is to improve performance by reducing costly queries on Drupal's db.

From what I can gather, they compare as follows:

  1. Apache Solr Views completely replaces Drupal's database. Using Views 3, Solr is set as a data source and the resulting views are built directly on Solr, both the filtering and the results data being returned from Solr indexes.

  2. Searchlight uses a two stage query involving both Solr and the Drupal db. An initial request is sent to Solr, passing in search parameters. Solr returns a result of nid values matching the search parameters based on the Solr index. These nids are then used in an SQL select to load data from the Drupal db, resulting in a query similar to the following:

SELECT n.nid, n.title, n.created FROM {node} n WHERE n.nid in (123,124,125,126);

Of course, the query would be somewhat more complex if it involved e.g. CCK fields shared between multiple content types.

For Searchlight, the Drupal db query - with pure primary key ID values in the where clause - should be highly optimized in comparison with an SQL query that includes e.g. full text matches on text fields. The heavy lifting is passed to Solr. But Drupal is still queried for the result data. Apache Solr Views in contrast shouldn't touch the Drupal db.

Is this an accurate capturing of the two modules' approaches?

How will the two compare in terms of performance? Will Searchlight get a substantial improvement over a non-Solr view but still fall significantly short of Apache Solr Views due to the reliance on a Drupal db query? Or is a pure nid where clause fast enough to make the difference insignificant? If the answer to this depends on the size of the site or the database being used, what would be typical thresholds--e.g., number of nodes at which a performance difference between the two might become significant?

Is there a way to get Searchlight to fetch full results (not just nids) from Solr?

Any hints or guidelines would be welcome and helpful for those trying to select between the two.

Thanks!

Comments

Indexing is also different

robertdouglass's picture

Especially when using Sphinx. From what I understand, the Sphinx indexing of the view is pretty much instantaneous whereas the Solr indexing is cron based. Sphinx trumps Solr (in the Drupal world) when indexing speed is important. This is something we need to work on.

Other than that I think your analysis is correct. It's worth noting that node access is easier (more complete) with Searchlight as it uses the normal Drupal mechanisms to handle it. One can easily run into edge cases with ApacheSolr that the apachesolr_nodeaccess module cannot handle.

It's also really neat that Searchlight lets you build any view, including all of the non-node stuff, and search on it.

A real time indexing mode

pwolanin's picture

A real time indexing mode seems available in the spinx beta, but not clear that existing tools would use it. The standard is a main + delta index it seems.

http://www.sphinxsearch.com/docs/current.html#live-updates

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: