On 5/25/07, Ethan Gruber <[EMAIL PROTECTED]> wrote:
Posting utf8-example.xml is the first thing I tried when I ran into this problem, and like the other files I had been working with, query results return garbage characters inside of unicode.
After posting utf8-example.xml, try this query: http://localhost:8983/solr/select?indent=on&q=id%3AUTF8TEST&fl=features&wt=python The python writer uses unicode escapes to keep the output in the ascii range, so it's an easy way to see exactly what Solr thinks those characters are. You should get { 'responseHeader':{ 'status':0, 'QTime':0, 'params':{ 'wt':'python', 'indent':'on', 'q':'id:UTF8TEST', 'fl':'features'}}, 'response':{'numFound':1,'start':0,'docs':[ { 'features':[ 'No accents here', u'This is an e acute: \u00e9', u'eaiou with circumflexes: \u00ea\u00e2\u00ee\u00f4\u00fb', u'eaiou with umlauts: \u00eb\u00e4\u00ef\u00f6\u00fc', 'tag with escaped chars: <nicetag/>', 'escaped ampersand: Bonnie & Clyde']}] }} If you do, that means that the problem is not getting the data into solr, but the interpretation of what you get out. -Yonik