Ok, thanks to Your posts, I've read some basic on encoding and made some
changes to my code: now it's all much more clear... but I still have some
problems.

This is what I do (don't know if this can help someone having same problems
I had):

- I get data from a DB telling JDBC connector to use UTF-8.

- then i convert in Java string internal encoding (UTF-16 I have learned) in
this way:

                new String(rs.getBytes(rsField), "UTF-8")

this way I get the UTF8 byte array from my resultset (from MySQL) then
telling String constructor that the array is to be interpreted in UTF8.

When I have to write the update XML document to solr:

                URLConnection conn = url.openConnection();
                conn.setRequestProperty("Content-Type", "text/xml; 
charset=utf-8");
                conn.setDoOutput(true);
                wr = new OutputStreamWriter(conn.getOutputStream(), "UTF-8");
                wr.write(data);
                wr.flush();

So I'm sure everything is converted back to UTF8 when writing to the update
solr url.

This way everything is fine getting normal field from documents (we can get
back all our diacritical chars and Euro sign)... but:

-  I cannot search using diacritical.
If i have a doc with a field containing "città", I cannot get it back with
q=field:città (in the url the à get converted to utf8 E0 like this
"citt%E0").
The strange thing is that using an old solr with Jetty 6.0.beta the search
with diacritical was ok, but responses got back from solr doubly utf8
encoded (we had to decode two times). Using last version of Solr with jetty
5.1.X responses are single utf8 encode (as You would expect) but diacritical
search is not running. Is there a particular way to do this ?

- I still have problems getting back fields stored in XML that contain
diacritical (I've followed your advices and have escaped myself the < sign
but the result is the same as usig CData (i dont use DOM here), by the way,
why did You said not to use CData?):
I get the same problem I showed You in my first post of a malformed XML.

Thank You again

   Fabio

--
View this message in context: 
http://www.nabble.com/International-Charsets-in-embedded-XML-t1780147.html#a4884245
Sent from the Solr - User forum at Nabble.com.

Reply via email to