Hi

If you are seeing " appelé au téléphone" in the browser, I would guess that 
the data is being rendered in UTF-8 by your server and the content type of the 
html is set to iso-8859-1 or not being set and your browser is defaulting to 
iso-8859-1. 

You can force the encoding to utf-8 in the browser, usually this is a menu item 
(in Chrome/Safari/Firefox).

FWIW having messed around with this kind of stuff in the past, I always 
generate utf-8 and always set the HTML content type to utf-8 with:

        <meta contentType-equiv="Content-Type" content="text/html; 
charset=utf-8" />

Cheers

François


On Jul 29, 2014, at 3:59 PM, Gulliver Smith <gulliver.m.sm...@gmail.com> wrote:

> Thanks for the information about URIEncoding="UTF-8" in the tomcat
> conf file, but that doesn't answer my main concerns:
> - what is the character encoding of the text in the title_fr field?
> - is there any way to force it to be UTF-8?
> 
> On Tue, Jul 29, 2014 at 8:35 AM,  <aurelien.mazo...@francelabs.com> wrote:
>> Hi,
>> 
>> If you use solr 4.8.1, you don't have to add URIEncoding="UTF-8" in the
>> tomcat conf file anymore :
>> https://wiki.apache.org/solr/SolrTomcat
>> 
>> 
>> Regards,
>> 
>> Aurélien MAZOYER
>> 
>> 
>> On 29.07.2014 14:22, Gulliver Smith wrote:
>>> 
>>> I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in
>>> Tomcat's server.xml has been changed to include character encoding
>>> UTF-8:
>>> 
>>> <Connector port="8080" protocol="HTTP/1.1"
>>>               URIEncoding="UTF-8"
>>>               connectionTimeout="20000"
>>>               redirectPort="8443" />
>>> 
>>> 
>>> I am posting to the server from PHP 5.5 curl. The extract POST was
>>> intercepted and confirmed that everything is being encode in UTF-8.
>>> 
>>> However, the responses to query commands, whether XML or JSON are
>>> returning field values such as title_fr in something that looks like
>>> latin1 or iso-8859-1 when displayed in a browser or editor.
>>> 
>>> E.g.: "title_fr":[" appelé au téléphone"]
>>> 
>>> The highlights in the query response do have correctly displaying
>>> character codes.
>>> 
>>> E.g. "text_fr":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n \nappelé au
>>> téléphone\nappelé au téléphone\n
>>> 
>>> PHP's utf8_decode doesn't make sense of the title_fr.
>>> 
>>> Is there something to configure to fix this and get proper UTF8
>>> results for everything?
>>> 
>>> Thanks
>>> Gulliver

Reply via email to