safSDgDFgzfd

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

dfhgzdfhgzdfhxdfbhzxcbczvbcvbcvcbcvbcvgz

Comments

Some structural changes I'd like to see

robertDouglass's picture
  • The steps that are taken during indexing (and later during query parsing) should be made atomic and chained, similar to input formats and filters. This would combine the preprocess hook with text transformations that already happen (stripping punctuation, lowercase, etc.) I've considered building a prototype with the current filter system.
  • hook_search('name') needs to return more metadata. Ideally modules could provide multiple searches and allow themselves to be configured (such as adding elements to the search form, or defining their own search form). Perhaps other $ops are needed.
  • The type column in all search tables should be removed and each type should maintain its own tables.
  • The keywords should not persist throughout the request in the form of a string, but rather an object that handles adding fields, removing fields, cloning etc. I have such an object that is very near general purpose use in the ApacheSolr module.
  • The service of tracking which nodes need indexing should be generalized so that other node indexing modules can use the same code to track which nodes they need to index. There is code (less elegant than the previously mentioned query builder) in the ApacheSolr module that does this.
  • do_search needs refactoring. Possibly needs to be broken into two or more phases that include callbacks to the the caller, or a query object that has defined query building setters. Sending in snippets of raw SQL for multiple phases is a bit confusing.

Improvements

Jeff Veit's picture

I'm concerned with external search engine integration. This is partly a list of thoughts, and partly a response, from notes that I've been making while thinking about it today. Hopefully it's valuable input.

It's useful to have a Drupal framework for the different phases of search. These are the ones that I think need abstractions: tracking index changes, indexing, search form and form validation, query parse, query build, performing the search, displaying the results, displaying further results. I think that these abstractions should be agnostic interfaces which don't directly reflect the Drupal internal search mechanism.

Core Search module heads in the right direction - separate modules for search framework and a the Drupal search implementation. This makes it much easier to use an external engine.

Tracking index changes - xapian module uses this, building a search queue. My mnogosearch work is probably going to use the same. It should also be possible to make a lightweight external calls to add to the queue - we use vbulletin and I can imagine a vbulletin call to add data into the queue.

It would be useful to be able to integrate results from different search methods. And it might be interesting to allow search decoration. So for instance, if I start at the standard search box and type 'robertDouglas' then the normal search module might show all the pages where Robert has posted. But the user module should be able to say that the most important is Roberts profile, so that this is ranked at or near the top of the results list.

Query parsing: when passing the query out to an external engine, query parsing is probably limited. It might be useful to be able to replace particular terms for instance mnogosearch uses & and ~ instead of AND and NOT, but to present a consistent interface AND and NOT should be used.

Integration of external engines with content types. Most of the industrial strength search engines allow data partition on tags, or sections or something similar to provide faceted search. But it's not just CCK content types, it's modules too that implement a content type: for instance search on book content. To make this work, it helps to have special knowledge when indexing - for instance by adding tags, or weights, or facet info.

"do_search needs refactoring. Possibly needs to be broken into two or more phases that include callbacks to the the caller, or a query object that has defined query building setters. Sending in snippets of raw SQL for multiple phases is a bit confusing." do search needs to be an abstraction, calling the relevant registered search engines. The query should be an object, probably decorated by the query parser, because different search engines have different interfaces. The method of calling an engine should be embodied in a function because different search engines have different mechanisms - at least 3 exist already in the wild - through URL, through PHP extension call, and through external process - e.g. perl call.

"The keywords should not persist throughout the request in the form of a string, but rather an object that handles adding fields, removing fields, cloning etc." See above: sometimes the keywords are the most useful. Other times the keywords will need to be combined with special knowledge when using an external search engine - e.g. we are trying to search users. The framework should be agnostic and definitely shouldn't throw away or hide information.

Different search engines WILL return results in different ways, so perhaps the framework should have multiple levels of cusomisation. For instance - return a page without breaking it into individual results and theming; or, return theme individual results.

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: