Thanks for the information about URIEncoding="UTF-8" in the tomcat conf file, but that doesn't answer my main concerns: - what is the character encoding of the text in the title_fr field? - is there any way to force it to be UTF-8?
On Tue, Jul 29, 2014 at 8:35 AM, <aurelien.mazo...@francelabs.com> wrote: > Hi, > > If you use solr 4.8.1, you don't have to add URIEncoding="UTF-8" in the > tomcat conf file anymore : > https://wiki.apache.org/solr/SolrTomcat > > > Regards, > > Aurélien MAZOYER > > > On 29.07.2014 14:22, Gulliver Smith wrote: >> >> I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in >> Tomcat's server.xml has been changed to include character encoding >> UTF-8: >> >> <Connector port="8080" protocol="HTTP/1.1" >> URIEncoding="UTF-8" >> connectionTimeout="20000" >> redirectPort="8443" /> >> >> >> I am posting to the server from PHP 5.5 curl. The extract POST was >> intercepted and confirmed that everything is being encode in UTF-8. >> >> However, the responses to query commands, whether XML or JSON are >> returning field values such as title_fr in something that looks like >> latin1 or iso-8859-1 when displayed in a browser or editor. >> >> E.g.: "title_fr":[" appelé au téléphone"] >> >> The highlights in the query response do have correctly displaying >> character codes. >> >> E.g. "text_fr":[" \n \n \n \n \n \n \n \n \n \n \nappelé au >> téléphone\nappelé au téléphone\n >> >> PHP's utf8_decode doesn't make sense of the title_fr. >> >> Is there something to configure to fix this and get proper UTF8 >> results for everything? >> >> Thanks >> Gulliver