Test spelling suggestions for D7

Events happening in the community are now at Drupal community events on www.drupal.org.
robertdouglass's picture

http://drupal.org/node/247482

This patch adds spelling suggestions to the page that is returned if no search results are found.
Only local images are allowed.

This is done by utilizing the Levenshtein algorithm for calculating the nearness of words.

The process for accomplishing spelling suggestions is:

  1. Create a new database table, search_spellings. It has two columns, word and characters. word is a unique word found in the search dataset (ie not subject to processing like stemming). characters is the character count of the word. characters is used to restrict the set of words against which a spelling suggestion is sought, following the logic that a 3 letter word can't be a misspelling of a 10 letter word.
  2. On indexing, the words in the search_dataset are added to search_spellings making a globally unique list of words in the search index. A shutdown function is used for this purpose to avoid duplicate database queries.
  3. A new $op for hook_search has been added, 'no results'. This gives modules a chance to set their own message when no results are found. If the $op isn't implemented the famous blue smurfs message is seen. node_search now uses the 'no results' op to look for spelling suggestions. It utilizes the new search API function search_spelling_suggestion($word), a function which will return the top spelling suggestion and its score.

Please note that the spelling suggestions don't in any way try to guarantee that the words are actually spelled <emcorrectly, but rather that if you type in a near miss, the system will suggest an alternate spelling based on what is in the index.

I didn't try to implement spelling suggestions on pages that have search results (a la Google), but that could be done later.

Comments

The latest patch has a

robertdouglass's picture

The latest patch has a unicode safe implementation of the levenshtein algorithm that is an improvement over the PHP implementation.

Search

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: