I've recently taken over maintenance of the Porter Stemmer module, and I also have another search-related module on d.o called Search by Page. Both rely on and expand core Search technology.
In the course of maintaining those two modules, various users have identified two issues that turned out to be issues in core Search. Although it would be great to get them both fixed in Drupal 6, I think that's unlikely... but I would really like to get them fixed in Drupal 7. I submitted patches for both of them, but they've been sitting in "needs review" status for weeks, and the code freeze is coming up soon. So if anyone has time to review them, or add comments on their importance (or convince me otherwise) I'd be grateful.
-
http://drupal.org/node/493270 - This issue has to do with the search_excerpt() function, which doesn't currently work well with stemming modules, because it doesn't match words in the text with their stemmed equivalent when it's generating the excerpt. My patch proposes adding a hook that would let a stemmer module highlight stemmed equivalent words. There's also an implementation there for Drupal 6 for Porter Stemmer, as a (working) illustration.
-
http://drupal.org/node/511594 - This issue has to do with stemming on a multi-lingual site. Currently, hook_search_preprocess() does not tell any potential stemming modules what language the text is in that needs pre-processing, so if you have a multi-lingual site, you could be trying to stem your Spanish content and search terms with an English-language stemmer, and vice versa. This is bad, as the stemming algorithms are very language-specific. The patch adds a new argument to hook_search_preprocess() so that a particular stemming algorithm can avoid preprocessing text that isn't in its own language.
Thanks in advance for any time you can spend reviewing these patches, commenting, etc. And of course, please put your comments on the issues and not here!
