Search on Mercury provided by ApacheSolr and hosted in tomcat. Tomcat is running on port 80. When a request contains UTF-8 characters there are percent encoding and Tomcat can't deal with it by default. See : http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
Quote :
If you are going to query Solr using international characters (>127) using HTTP-GET, you must configure Tomcat to conform to the URI standard by accepting percent-encoded UTF-8.
Edit Tomcat's conf/server.xml and add the following attribute to the correct Connector element: URIEncoding="UTF-8".
<Server ...>
<Service ...>
<Connector ... URIEncoding="UTF-8"/>
...
</Connector>
</Service>
</Server>
This is only an issue when sending non-ascii characters in a query request... no configuration is needed for Solr/Tomcat to return non-ascii chars in a response, or accept non-ascii chars in an HTTP-POST body.
Following the above advice, the ApacheSolr search works ok for non-English words, like for CJK and others. I thought to share this here since this might be an issue for others as well with multilingual sites or non English user content.
Other references:
http://groups.drupal.org/node/39760#comment-112236
http://drupal.org/node/887064#comment-3344816
Comments
re: Tomcat tweak for ApacheSolr powered search on non English wo
Thanks for the tip - we're going to do some testing and add it to Pantheon (Also added it to the documentation of this project).
Greg
--
Greg Coit
Systems Administrator
http://www.chapterthree.com
Do note that it's also
Do note that it's also important to remove useBodyEncodingForURI from that Connector. I updated the documentation on troubleshooting Solr to reflect this.
Thx
Thx for sharing Wim! Very helpful.