It's not clear to me from any of the comments you've made in this thread 
wether you've ever confirmed *exactly* what you are getting back from 
solr, ignoring the PHP completley. (ie: you refer to "UTF-8 for all of the 
web pages" suggesting you are only looking at some web application which 
is consuming dat from solr)

What do you see when you use something like curl to talk to solr directly 
and inspect the raw bytes (in both directions) ?

For example...

$ echo '[{"id":"HOSS","fr_s":"téléphone"}]' > french.json
$ # sanity check that my shell didn't bork the utf8
$ cat french.json | uniname -ap
character  byte       UTF-32   encoded as     glyph   name
       23         23  0000E9   C3 A9          é      LATIN SMALL LETTER E WITH 
ACUTE
       25         26  0000E9   C3 A9          é      LATIN SMALL LETTER E WITH 
ACUTE
$ curl -sS -X POST 'http://localhost:8983/solr/collection1/update?commit=true' 
-H 'Content-Type: application/json' -d @french.json 
{"responseHeader":{"status":0,"QTime":445}}
$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&omitHeader=true&indent=true'
{
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"HOSS",
        "fr_s":"téléphone",
        "_version_":1475795659384684544}]
  }}
$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&omitHeader=true&indent=true'
 | uniname -ap
character  byte       UTF-32   encoded as     glyph   name
       94         94  0000E9   C3 A9          é      LATIN SMALL LETTER E WITH 
ACUTE
       96         97  0000E9   C3 A9          é      LATIN SMALL LETTER E WITH 
ACUTE



One other cool diagnostic trick you can use, if the data coming back 
over the wire is definitely no longer utf8, is to leverate the "python" 
response writer, because it generates "\uXX" escape sequences for 
non-ASCII strings at the solr level -- if those are correct, that helps 
you clearly identify that it's the HTTP layer where your values are 
getting corrupted...

$ curl -sS 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=python&omitHeader=true&indent=true'
{
  'response':{'numFound':1,'start':0,'docs':[
      {
        'id':'HOSS',
        'fr_s':u't\u00e9l\u00e9phone',
        '_version_':1475795807492898816}]
  }}


-Hoss
http://www.lucidworks.com/

Reply via email to