Apache Solr Multilingual - Non-English and Multilingual Search

Events happening in the community are now at Drupal community events on www.drupal.org.
mkalkbrenner's picture

We just released the first alpha version of Apache Solr Multilingual which supports language specific stemming, synonyms and compound word splitting. There's still a lot to do but any feedback at this early stage of the project will be helpful.

Comments

Great!

gagarine's picture

This is really interesting and exactly what I'm looking for :). I will try it soon on a 3 languages website.

How it handle content with no specified language? The user can limit search on one specified language? Can we use language has a facet?

Thanks

Your Questions

mkalkbrenner's picture

How it handle content with no specified language?

If you install the latest development snapshot instead alpha 1 you'll find an option to map language neutral content to a language.

The user can limit search on one specified language?

Use the facet as described below or write some code and implement apachesolr_modify_query.
Or describe your needs in detail and open a feature request at http://drupal.org/project/issues/apachesolr_multilingual?categories=All

Can we use language has a facet?

Yes, simple turn on language facet that comes with apachesolr.

Thanks a Lot!

nimi's picture

Wonderful module!

How do I apply stemming for the Hebrew language?
Does it know how to break up words on its own or do I need to insert some custom code?

Perhaps the is some ready-made algorithm I could use?

Thanks,
Nimi

Built-In Stemmers

mkalkbrenner's picture

Thanks for your feedback. Unfortunately Hebrew is not available as stemmer within Solr:

<?php
 
static $available_stemmers = array(
   
'da' => 'Danish',
   
'nl' => 'Dutch',
   
'en' => 'English',
   
'fi' => 'Finnish',
   
'fr' => 'French',
   
'de' => 'German',
   
'it' => 'Italian',
   
'nn' => 'Norwegian',
   
'nb' => 'Norwegian',
   
'pt-br' => 'Portuguese',
   
'pt-pt' => 'Portuguese',
   
'ro' => 'Romanian',
   
'ru' => 'Russian',
   
'es' => 'Spanish',
   
'sv' => 'Swedish',
   
'tr' => 'Turkish',
  );
?>

Maybe there's an extension that adds more stemmers to Solr or Lucene ...

I see. Oh well, I guess I

nimi's picture

I see. Oh well, I guess I would have to wait until a Hebrew stemmer is developed.

Thanks.

Wao, great to see this

ipsitamishra's picture

Wao, great to see this module.

I am going to give it a try. I am implementing apache solr for a multilingual site with spanish & Portuguese. My good luck, both these languages are available as stemmer in this module.

~ Ipsita Mishra

We will add new stemmers

mkalkbrenner's picture

We put on our task list to implement all the new stuff explained at
http://wiki.apache.org/solr/LanguageAnalysis

If anyone is interested in testing new languages please participate at http://drupal.org/node/896896

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: