Optimizing Apachesolr for non-english languages
I have had a lot of research about optimizing Apachesolr for non-english languages, especially for German. It comes out, that there search results can be dramatically improved by adjusting Solr's stemming and by breaking up compound words. This can be easily achieved with slight changes to Apachesolr's schema.xml.
This post is about configuring stemming:
http://www.early-dance.de/en/news/9188-optimizing-apachesolr-non-english...
And this post is about compound word splitting, that is needed in languages like German that have long combined words like "Dampfschifffahrt":
http://www.early-dance.de/en/news/9189-apachesolr-issues-german-and-othe...


Thanks!
That's great information. A lot of people ask about this and first hand experiences are super helpful.
I made my first proposals
I made my first proposals regarding these issues half a year ago:
http://drupal.org/node/463886
We also tweaked word splitting on one index in our production environment, but didn't provide a patch right now.
Markus Kalkbrenner
Cocomore AG
drupal.cocomore.com
Yes Markus, your post was
Yes Markus, your post was very helpful and encouraged me to do further research.
But what surprised me most was that a compound splitter is already built into Apachesolr!
compound splitter
We had a look at compound splitter and integrate it shortly in our localized version of apachesolr which is freely available here:
http://drupal.cocomore.com/de/project/apachesolr
BTW beside a compound splitter patch we're currently working on xi:include patches and a German manual.
Hi Markus, this sounds
Hi Markus, this sounds definitely interesting. I'll check it!
Stefan