Posted by fp on December 14, 2009 at 6:49pm
I am trying to run apachesolr on a site which for now has only French content.
I have attached both the schema.xml I use and the query results from solr for a query on the word "Vidéocassettes".
From what I have gathered so far, I assumed that the following filter
charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"
would take care of mapping the accents for both the indexing and the querying. If I remove the accent from my query (eg: "Videocasettes") I get the expected results which leads me to think that the indexing character mapping is working. However, the accented query return no results.
What am I missing?
Thanks!
fp
| Attachment | Size |
|---|---|
| schema.xml.txt | 10.11 KB |
| results.xml.txt | 2.45 KB |

Comments
Might be a Tomcat configuration issue
I have encountered the same issue, which was fixed by adding the
URIEncoding="UTF-8"attribute to the rightConnectorelement in Tomcat's server.xml.// David Lesieur // Associé // Whisky Echo Bravo // Développement Web, experts Drupal // Montréal //
Indeed
Fantastic! Thanks for your timely response David. Much appreciated.
Is this solution is really working ?
I've try but maybe I've made a mistake, are you talking about the server.xml located in the tomcat6/conf ???
My problem is that if I index data with accent like É, À, È it will not be listed alphabetically with the E or A but at the end of the list...
I've tried to reindex with your modification and the accent are still ordered at the end of the list... Am I missing something ?
Thanks!
The moon is closer to the sun than I am to anyone.
Beyond the server.xml
Yes, you are referring to the correct server.xml. I assume that you have restarted tomcat...
Have you had a look at the schema.xml(.txt) file that I posted originally? There are a couple important bits, such as
mapping-ISOLatin1Accent.txtfor thecharFilterand<filter class="solr.SnowballPorterFilterFactory" language="French"/>.You seem to have modify the
You seem to have modify the stopwords to be french, do you have an example of what you did in this file.
I think it's the only thing that I'm missing.
Thank you for your help I really appreciate !
The moon is closer to the sun than I am to anyone.
One other thing to take into
One other thing to take into account in the server.xml is to remove
useBodyEncodingForURIfrom the Connector.I spent hours trying to figure out why UTF-8 wasn't working while I had
URIEncoding="UTF-8"enabled. After a whole lot of testing it was theuseBodyEncodingForURIthat was causing the issue.