SearchAPI Module

Events happening in the community are now at Drupal community events on www.drupal.org.
BlakeLucchesi's picture

The following is my first revision of a proposal to create a search API module. I'd love to get some feedback.

Project Details
A Drupal search API would allow for separation between the search interface that end users interact and the back-end indexing and retrieval work that a search engine performs. The advantages to creating a search API are:
* It will allow a site administrator to open up a number of different search interfaces to site users. Imagine a separate search page to search through only certain node types, or enabling one interface that uses a Solr backend and another that utilizes a drupal based search.
* It will allow larger websites who wish to integrate the features of a more full featured search engine such as Solr or Sphynx (to name a few), into their existing Drupal website.
* It will allow the “plug and play” search engines to define a number of features that the core search API could use to present information to the end users.
* Spelling suggestions
* Search facets
* Find related results to X result

David Lesiuer has created a Faceted Search module already provides a great interface for site administrators to create different search ‘interfaces’ for their site users. These interfaces are configurable to allow users to search through only specific content types, include content from only specific taxonomies, etc.

Benefits to the Community
As Drupal continues to grow so does the amount of content residing in many of the sites powered by it. There is a need for a better performing solution than the simple SQL powered search that Drupal currently ships with. By creating this module it will allow smaller websites to continue using a lighter search engine powered by their Drupal database and larger websites the ability to utilize their enterprise search engines without risking interference with the drupal core search.

Other community topics regarding the creation of such a module can be found below:
* http://boldsource.com/articles/advancing-drupal-search
* http://groups.drupal.org/node/10128
* http://robshouse.net/2008/03/05/event/drupalcon-boston-solr-bof

Deliverables
The final product will be a self contained search API module which will allows search engines to register themselves with a Drupal site. The search API will handle the user interface for creating search interfaces.
Because this module will rely on data to be supplied from each individual search engine module it will be important to include simpletest scripts that ensure the module works as expected and provides the necessary functionality for a wide range of search engines to interact with it. These simpletests will prove useful not only for testing the API but for allowing search engine developers to see how their code can interact with the APi.

Project Schedule
The following is a proposed schedule for the project that will also include weekly updates to the community and my mentor.
May 8-11: Drupal Search Sprint in Minnesota. I have arranged to participate in the search sprint at the University of Minnesota.
May 26: SoC Official Coding Starts. By the start of coding I will have an outline of hooks and functions that will need to be implemented in the core API so that other search modules can interface with it efficiently.
June 5: Simpletest patterns will be developed to represent a typical search engine plugin so that development on the API can be tested effectively.
July 14: Midterm evaluation. By the midterm review I should be able to put together a working implementation of the search API so that further testing and community feedback can be given on a working module.
August 11: Last week for code completion. Between the midterm and the final evaluation I will work to complete a themable user interface for the search API, which will allow the search engines the opportunity to define things like spelling suggestions and advanced search forms.

Bio
My name is Blake Lucchesi and I participated with Drupal in SoC 2007. I was responsible for the creation of the fuzzy search module. I realized while doing my project last year that the core Drupal search implementation allowed little flexibility in controlling the way core search indexing procedure, and had to recreate my own search implementation which had a lot of overlap in the basic functions such as word tokenizing and user interface handling.

After further discussion at this past DrupalCon many agreed that developing a better core API for search is a good route to take because of the wide variety of uses that each website will have. A better search API will allow programmers and non-programmers to extend their search functionality with ease and without the risk of breaking the core drupal search functionality.

My other involvements in the Drupal community include contributing the Ubercart Coupons module, a module that connects the Wordfilter and Workflow-ng modules, contributing a few patches to allow user names to be searchable by the core search module, presenting at local group meetups and writing various mini tutorials on my blog.

Comments

What's wrong with the

cwgordon7's picture

What's wrong with the existing search module to serve as an API?

Several things.

robertdouglass's picture

Without commenting on the proposed API, I want to note that the existing search API has a series of (mostly minor) limitations that prevent really rich search solutions to be built upon it, especially as soon as you want to move beyond Drupal's built in indexing mechanism. What we want to move towards is an independence between the front-end search builder and the back -end search implementor. A lot of where we're going might be achieved with really incremental changes, but we might also find that some bigger blocks need to be pushed around. Blake and Nedjo are putting stakes in the ground so that we can look at the options.

Here's another stake in the ground

robertdouglass's picture

http://groups.drupal.org/node/10128

Blake, if you haven't already you'll want to join the search group. http://groups.drupal.org/node/4102

robertdouglass's picture

Blake, this is a tough call. What you're proposing here is basically to implement whatever we concoct at the May search sprint. While I really like that idea at some level, it might be a tough sell to the people who are evaluating your SoC application. In your position right now, I would perhaps try to narrow down the deliverables into something less comprehensive and abstract. Take a close look at the UI in faceted search and find 1 or two of the features that are so great about it. Pick features that fit with core search as we know it today and make implementing those features for core search your project.

It's ok that you'll be working on SoC in the context of the larger search effort that you're also part of, but you have to take care that what you propose for SoC doesn't depend on our May sprint in any way. Make sense?

As an SoC project I think your proposal here is too broad and too ambitious. I like to see projects that have really narrow focus and allow for creating polished code by the end of summer.

EDIT: And you have to answer cwgordon7's question in your application because every reviewer will ask it. Right now it sounds like you're proposing to rewrite core search from scratch, whereas our May sprint will most certainly take core search as our starting point.

Thanks for the comments

BlakeLucchesi's picture

Thanks for the comments Gordon and Robert. I agree with you that it does have quite a large/ambiguous scope. I will rewrite the proposal to cover a specific set of features that I will ensure get developed into the new API that is going to be worked on. I think that would be a good way to define the deliverables. I'll post up a new revision later this evening.

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: