On 5/25/07, Ethan Gruber <[EMAIL PROTECTED]> wrote:
Posting utf8-example.xml is the first thing I tried when I ran into this
problem, and like the other files I had been working with, query results
return garbage characters inside of unicode.

After posting utf8-example.xml, try this query:

http://localhost:8983/solr/select?indent=on&q=id%3AUTF8TEST&fl=features&wt=python

The python writer uses unicode escapes to keep the output in the ascii
range, so it's an easy way to see exactly what Solr thinks those
characters are.
You should get

{
'responseHeader':{
 'status':0,
 'QTime':0,
 'params':{
        'wt':'python',
        'indent':'on',
        'q':'id:UTF8TEST',
        'fl':'features'}},
'response':{'numFound':1,'start':0,'docs':[
        {
         'features':[
          'No accents here',
          u'This is an e acute: \u00e9',
          u'eaiou with circumflexes: \u00ea\u00e2\u00ee\u00f4\u00fb',
          u'eaiou with umlauts: \u00eb\u00e4\u00ef\u00f6\u00fc',
          'tag with escaped chars: <nicetag/>',
          'escaped ampersand: Bonnie & Clyde']}]
}}

If you do, that means that the problem is not getting the data into
solr, but the interpretation of what you get out.

-Yonik

Reply via email to