Posted by mkalkbrenner on March 31, 2010 at 2:54pm
We just released the first alpha version of Apache Solr Multilingual which supports language specific stemming, synonyms and compound word splitting. There's still a lot to do but any feedback at this early stage of the project will be helpful.

Comments
Great!
This is really interesting and exactly what I'm looking for :). I will try it soon on a 3 languages website.
How it handle content with no specified language? The user can limit search on one specified language? Can we use language has a facet?
Thanks
Your Questions
If you install the latest development snapshot instead alpha 1 you'll find an option to map language neutral content to a language.
Use the facet as described below or write some code and implement apachesolr_modify_query.
Or describe your needs in detail and open a feature request at http://drupal.org/project/issues/apachesolr_multilingual?categories=All
Yes, simple turn on language facet that comes with apachesolr.
Personal Genomics Servicesbio.logis GmbH
Thanks a Lot!
Wonderful module!
How do I apply stemming for the Hebrew language?
Does it know how to break up words on its own or do I need to insert some custom code?
Perhaps the is some ready-made algorithm I could use?
Thanks,
Nimi
Built-In Stemmers
Thanks for your feedback. Unfortunately Hebrew is not available as stemmer within Solr:
<?phpstatic $available_stemmers = array(
'da' => 'Danish',
'nl' => 'Dutch',
'en' => 'English',
'fi' => 'Finnish',
'fr' => 'French',
'de' => 'German',
'it' => 'Italian',
'nn' => 'Norwegian',
'nb' => 'Norwegian',
'pt-br' => 'Portuguese',
'pt-pt' => 'Portuguese',
'ro' => 'Romanian',
'ru' => 'Russian',
'es' => 'Spanish',
'sv' => 'Swedish',
'tr' => 'Turkish',
);
?>
Maybe there's an extension that adds more stemmers to Solr or Lucene ...
Personal Genomics Servicesbio.logis GmbH
I see. Oh well, I guess I
I see. Oh well, I guess I would have to wait until a Hebrew stemmer is developed.
Thanks.
Wao, great to see this
Wao, great to see this module.
I am going to give it a try. I am implementing apache solr for a multilingual site with spanish & Portuguese. My good luck, both these languages are available as stemmer in this module.
~ Ipsita Mishra
We will add new stemmers
We put on our task list to implement all the new stuff explained at
http://wiki.apache.org/solr/LanguageAnalysis
If anyone is interested in testing new languages please participate at http://drupal.org/node/896896
Personal Genomics Servicesbio.logis GmbH