Zend Search for Drupal

Events happening in the community are now at Drupal community events on www.drupal.org.
jsloan's picture

Although this group is focusing on Drupal integration with external Java Lucene indexers I've been invited to join this group to announce a PHP implementation of the Lucene search for Drupal. So this will be a little different direction for the group.

I've begun work on a 4.7 module that implements Lucene search from the Zend Framework. The framework is still a moving target, limited utf8 support was introduced last week and from the mailing list is this hint that "Full utf-8 support is planned, but will come later."

My intention is to develop a contribution module for Drupal that will have it's own API for creating, updating and searching a local Lucene index. Integration with the existing search core module may be possible but it seems that the architecture of that module precludes this from being a simple extension. Please correct me if I am wrong with this conclusion.

The initial design of the module implements the index update within the nodeapi hook. So all indexing is done in real time and results are available immediately. There is also a cron hook for indexing in a batch mode for cases of index initialization or rebuilds, batch content import, and for sites that do not want to index in real time.

My first approach is to index nodes and comments separately with their own unique document structure and key fields. I think that a robust API will allow the indexing of anything(nodes, comments, taxonomy, menus, feed, etc...) but this is yet to be developed.

This module will be dependent on the [[http://framework.zend.com/|Zend Framework]] of which the Lucene search is just one of many framework components. Integration of the Zend Framework will be an issue of it's own. For this reason may I suggest that a Zend Framework "group" be created and that Drupal implementations of Zend Framework Components discussed there.

Comments

This group is for you

robertdouglass's picture

Literally =)

It isn't clear from your description what your code will do. Could you describe your motivation for this code? It's advantages/disadvantages over built-in Drupal search? What's the status of your code, and what's your roadmap? Thanks!

It will be an alternative to the core search module.

jsloan's picture
  1. Indexing of all content
  2. simple search of the index
  3. advanced search of the index
  4. API for specialized content indexing
  5. Support for multiple indexes

Good answers!

robertdouglass's picture

That's what I'm hoping for. Last questions; 1) there are many implementations of Lucene... will your code work with all of them (granted the proper bridges eg. PHP<-->Java)? 2) How will your code work with non-Drupal index searching (say with the Lucene index produced by Nutch)?

Lucene binary compatibility

jsloan's picture

The intention of the Zend port of Lucene is to have binary compatibility with all platforms.

Internally I am testing the IBM/Yahoo OmniFind, but rather than have a spider crawl our Drupal sites I am investigating the interoperability of the PHP and Java lucene indexes. In preparation for this I knew that I would need a good Lucene indexer for each site.

I'll let you know for sure, but theoretically the indexes created by one should be able to be read(searched) by the other.

Current status

jsloan's picture

It is running on my laptop (Mepis/Ubnutu, Apache2, PHP5) using Drupal 4.7.6 and the latest nightly export of the Zend Framework from SVN. My next step is to introduce it to our Intranet where we have 30+ sites and 5000+ users.

It is currently cast in my own "get it running" structure borrowing heavily from the existing search module, I intend to refactor for Drupal style and then, from group input, recast it in a more workable structure.

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: