Tomcat tweak for ApacheSolr powered search on non English words

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
toemaz's picture

Search on Mercury provided by ApacheSolr and hosted in tomcat. Tomcat is running on port 80. When a request contains UTF-8 characters there are percent encoding and Tomcat can't deal with it by default. See : http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Quote :

If you are going to query Solr using international characters (>127) using HTTP-GET, you must configure Tomcat to conform to the URI standard by accepting percent-encoded UTF-8.

Edit Tomcat's conf/server.xml and add the following attribute to the correct Connector element: URIEncoding="UTF-8".

<Server ...>
<Service ...>
   <Connector ... URIEncoding="UTF-8"/>
     ...
   </Connector>
</Service>
</Server>

This is only an issue when sending non-ascii characters in a query request... no configuration is needed for Solr/Tomcat to return non-ascii chars in a response, or accept non-ascii chars in an HTTP-POST body.

Following the above advice, the ApacheSolr search works ok for non-English words, like for CJK and others. I thought to share this here since this might be an issue for others as well with multilingual sites or non English user content.

Other references:
http://groups.drupal.org/node/39760#comment-112236
http://drupal.org/node/887064#comment-3344816

Comments

Greg Coit's picture

Thanks for the tip - we're going to do some testing and add it to Pantheon (Also added it to the documentation of this project).

Greg

--
Greg Coit
Systems Administrator
http://www.chapterthree.com

Do note that it's also

wmostrey's picture

Do note that it's also important to remove useBodyEncodingForURI from that Connector. I updated the documentation on troubleshooting Solr to reflect this.

Thx

toemaz's picture

Thx for sharing Wim! Very helpful.

Mercury

Group organizers

Group categories

Post Type

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: