Test spelling suggestions for D7

robertDouglass's picture
public
group: Search
robertDouglass - Wed, 2008-04-16 17:49

http://drupal.org/node/247482

This patch adds spelling suggestions to the page that is returned if no search results are found.
Spelling suggestions

This is done by utilizing the Levenshtein algorithm for calculating the nearness of words.

The process for accomplishing spelling suggestions is:

  1. Create a new database table, search_spellings. It has two columns, word and characters. word is a unique word found in the search dataset (ie not subject to processing like stemming). characters is the character count of the word. characters is used to restrict the set of words against which a spelling suggestion is sought, following the logic that a 3 letter word can't be a misspelling of a 10 letter word.
  2. On indexing, the words in the search_dataset are added to search_spellings making a globally unique list of words in the search index. A shutdown function is used for this purpose to avoid duplicate database queries.
  3. A new $op for hook_search has been added, 'no results'. This gives modules a chance to set their own message when no results are found. If the $op isn't implemented the famous blue smurfs message is seen. node_search now uses the 'no results' op to look for spelling suggestions. It utilizes the new search API function search_spelling_suggestion($word), a function which will return the top spelling suggestion and its score.

Please note that the spelling suggestions don't in any way try to guarantee that the words are actually spelled , but rather that if you type in a near miss, the system will suggest an alternate spelling based on what is in the index.

I didn't try to implement spelling suggestions on pages that have search results (a la Google), but that could be done later.


The latest patch has a

robertDouglass's picture
robertDouglass - Tue, 2008-05-06 08:51

The latest patch has a unicode safe implementation of the levenshtein algorithm that is an improvement over the PHP implementation.