I've been doing my best to wrap my head around ways to make drupal more fuzzy search capable. The following are some goals of fuzzy search and I guess some comments. I'm not exactly sure how this will help as of now, but I really feel like along with improving the search engine's speed we should look at ways to provide more relevant results.
The goal of Fuzzy Matching should provide
- Synonym Matching
- words with hyphenations
- numbers typed as words or numerals (five or 5)
- general synonyms (car or automobile, same thing)
- same word used in different tense/plural (I traveled, he travels, you travel)
- Misspellings
- Missing Characters
- Transposed Characters (mostly happens when someone knows the spelling by doesn't type it correct. tpyo)
- Additional Characters (triggger vs trigger)
All of these lead to solutions which are both language dependent and independent.
- Use of levenshtein or similar_text function (Language Independent) (levenshtein is quicker and provides number of edits between two words, similar_text provides percentage of correctness)
- Stemming (Language Dependent) (this solves the issue of different tense/plural)
- Q or N-gram (Language Independent) (Breaks the words into smaller strings and indexes each. Apples to app, ppl, ple, les. This has the drawback that it can bloat the index and requires a significantly larger number of search queries, however, it provides a rather good solution for a language independent fuzzy matching. One drawback is that it modifies our search index and thus large sites going to this solution would need to reindex their existing site)
- Suffix strings (Language Independent) (This is along the same lines as the Q or N-gram based approach, the difference is in the difficulty in the recursion to come up with the suffix strings. Same drawback as q-grams in the need to re-index.)
One possible solution to cut down on the processing time of search is to provide fuzzy results only if no results were found by regular matching, in this case we would likely want to refrain from using a language dependent algorithm.
I have to go for now, but I just wanted to throw up some information, I'm sure those reading this are already familiar with the above information, but hopefully its useful for someone. I'd really like to get more discussion on these ideas as well because it will help me with my SoC project, and I want to be sure that the work I put in this summer is of value to the community.
I'll be back with more later.

Comments
existing capabilities, hooks, and prototyping new features
There already exists a porterstemmer module that uses hook_search_preprocess. I believe that all your other items could be build using this hook. It would be nice to have a search supplemental pack that included a bundle of modules like porterstemmer and the ones you are contemplating.
It's probably already too late for 6.x, though. But would be nice to scope and prototype for the future. I recommend trying to implement a couple of these and if necessary, suggesting new hooks for core search that allows them to happen. You seem to have some specific ideas about algorithms. I presume that these would require different data store and retrieve. Since faceted search (see nina) also has different store and retrieve requirements, one thing that we need to consider is a better abstraction of this. Can we use the current hook_search and hook_search_preprocess to accomplish this?
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
scoring fuzzy matches
I'd really like to get hook_node_rank #145242 into core, which would allow you to use this to alter the score ranking based on "fuzzy" matches. Everything else being equal, a fuzzy match should have a lower score than an exact match.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com