Hi If you are seeing " appelé au téléphone" in the browser, I would guess that the data is being rendered in UTF-8 by your server and the content type of the html is set to iso-8859-1 or not being set and your browser is defaulting to iso-8859-1.
You can force the encoding to utf-8 in the browser, usually this is a menu item (in Chrome/Safari/Firefox). FWIW having messed around with this kind of stuff in the past, I always generate utf-8 and always set the HTML content type to utf-8 with: <meta contentType-equiv="Content-Type" content="text/html; charset=utf-8" /> Cheers François On Jul 29, 2014, at 3:59 PM, Gulliver Smith <gulliver.m.sm...@gmail.com> wrote: > Thanks for the information about URIEncoding="UTF-8" in the tomcat > conf file, but that doesn't answer my main concerns: > - what is the character encoding of the text in the title_fr field? > - is there any way to force it to be UTF-8? > > On Tue, Jul 29, 2014 at 8:35 AM, <aurelien.mazo...@francelabs.com> wrote: >> Hi, >> >> If you use solr 4.8.1, you don't have to add URIEncoding="UTF-8" in the >> tomcat conf file anymore : >> https://wiki.apache.org/solr/SolrTomcat >> >> >> Regards, >> >> Aurélien MAZOYER >> >> >> On 29.07.2014 14:22, Gulliver Smith wrote: >>> >>> I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in >>> Tomcat's server.xml has been changed to include character encoding >>> UTF-8: >>> >>> <Connector port="8080" protocol="HTTP/1.1" >>> URIEncoding="UTF-8" >>> connectionTimeout="20000" >>> redirectPort="8443" /> >>> >>> >>> I am posting to the server from PHP 5.5 curl. The extract POST was >>> intercepted and confirmed that everything is being encode in UTF-8. >>> >>> However, the responses to query commands, whether XML or JSON are >>> returning field values such as title_fr in something that looks like >>> latin1 or iso-8859-1 when displayed in a browser or editor. >>> >>> E.g.: "title_fr":[" appelé au téléphone"] >>> >>> The highlights in the query response do have correctly displaying >>> character codes. >>> >>> E.g. "text_fr":[" \n \n \n \n \n \n \n \n \n \n \nappelé au >>> téléphone\nappelé au téléphone\n >>> >>> PHP's utf8_decode doesn't make sense of the title_fr. >>> >>> Is there something to configure to fix this and get proper UTF8 >>> results for everything? >>> >>> Thanks >>> Gulliver