Thanks for the information about URIEncoding="UTF-8" in the tomcat
conf file, but that doesn't answer my main concerns:
- what is the character encoding of the text in the title_fr field?
- is there any way to force it to be UTF-8?

On Tue, Jul 29, 2014 at 8:35 AM,  <aurelien.mazo...@francelabs.com> wrote:
> Hi,
>
> If you use solr 4.8.1, you don't have to add URIEncoding="UTF-8" in the
> tomcat conf file anymore :
> https://wiki.apache.org/solr/SolrTomcat
>
>
> Regards,
>
> Aurélien MAZOYER
>
>
> On 29.07.2014 14:22, Gulliver Smith wrote:
>>
>> I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in
>> Tomcat's server.xml has been changed to include character encoding
>> UTF-8:
>>
>>  <Connector port="8080" protocol="HTTP/1.1"
>>                URIEncoding="UTF-8"
>>                connectionTimeout="20000"
>>                redirectPort="8443" />
>>
>>
>> I am posting to the server from PHP 5.5 curl. The extract POST was
>> intercepted and confirmed that everything is being encode in UTF-8.
>>
>> However, the responses to query commands, whether XML or JSON are
>> returning field values such as title_fr in something that looks like
>> latin1 or iso-8859-1 when displayed in a browser or editor.
>>
>> E.g.: "title_fr":[" appelé au téléphone"]
>>
>> The highlights in the query response do have correctly displaying
>> character codes.
>>
>> E.g. "text_fr":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n \nappelé au
>> téléphone\nappelé au téléphone\n
>>
>> PHP's utf8_decode doesn't make sense of the title_fr.
>>
>> Is there something to configure to fix this and get proper UTF8
>> results for everything?
>>
>> Thanks
>> Gulliver

Reply via email to