(sorry the last one got wrongly posted) Here I am again with charset encoding problems:
I need to store XML in a document field. I declare it as string and surround it in CData when I post the add xml. Now the problem is I have some Iternational char in the XML: say ì or à and also € (i don't know if You can read these). When i get back from Solr the XML field strange things happens: - first one: € get converted to ? (I see it in the index looking with luke) - if there is an ì (accented ì) I get malformed XML back using with firefox and IE: <?xml version="1.0" encoding="UTF-8"?> <response> <responseHeader><status>0</status><QTime>0</QTime></responseHeader> <result numFound="1" start="0"> <doc> <str name="categoryid">/relazioni/</str> <str name="facetXML"><?xml version="1.0" encoding="UTF-8"?><xml> <filter field="typecamper_s"> <item value="autocaravanmansardato">Autocaravan ìMansardato</item> ^ HERE begins the problem: from now on no more shielding of "<" <item value="semintegrale">Semintegrale</item> </filter> </xml> HERE continues the output, as it should have been shielded after the problem above: </item><item value="semintegrale">Semintegrale</item></filter> </xml> </str> ... </doc> </result> </response> But if i get the same document in my request handler (as a Document structure) I don't have any problem parsing the XML and get the correct char. I have traced the XML.escape and the problem is not there so it's somewere between XMLWriter and Jetty (I've tried the last one 5.1.11). - if i put some international char in a normal string field I see Solr stores the UTF-8 (i Think) encoded char in a string as in a text field type. The question is: apart from the malformed XML issue, what is the better way to deal with internationa charsets ? Thank You Fabio -- View this message in context: http://www.nabble.com/International-Charsets-in-embedded-XML-t1780147.html#a4846383 Sent from the Solr - User forum at Nabble.com.