Looks like you have a double encoding problem. It might be because you fetch UTF-8 binary data from mysql (I know that for instance the perl driver has an issue with that) and you then encode it a second time in UTF-8 when you post to solr.
Make sure the string you're getting from mysql are actually proper unicode strings and not the raw UTF-8 encoded binary form. You may want to have a look at http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html for the proper option to use with your connection. What you can try to check you're posting actual UTF-8 data to solr is to dump your xml post in a file (don't forget to set the input encoding to UTF-8 ). Then you can check if this file is readable with any UTF-8 aware editor. Cheers, Jerome. On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Hi, > > I've solr 1.3 and tomcat55. > When I try to index a bit of data and I request ALL, obviously my accent and > UTF8 encoding is not took in consideration. > <doc> > <date name="created">2006-12-14T15:28:27Z</date> > <str name="description_ja"> > Le 1er film de Goro Miyazaki (fils de Hayao) > <br />je suis allÃ(c)e ... > .... > <str name="title_ja">渡邊 å‰ å· vs 三ç"°ä¸‹ç"° 1</str> > > > My database Mysql is well in UTF8, if I request data manually from mysql I > will get accent even japan characters properly > > I index my data, my data-config is : > <dataSource type="JdbcDataSource" > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://master-spare.videos.com/videos" > user="solr" > password="pass" > batchSize="-1" > responseBuffering="adaptive"/> > > My schema config file start by : <?xml version="1.0" encoding="UTF-8" ?> > > I've add in my server.xml : because my localhost point on 8180 > <Connector port="8180" maxHttpHeaderSize="8192" > maxThreads="150" minSpareThreads="25" maxSpareThreads="75" > enableLookups="false" redirectPort="8443" acceptCount="100" > connectionTimeout="20000" disableUploadTimeout="true" > URIEncoding="UTF-8" useBodyEncodingForURI="true" /> > > What can I check? > I'm using a linux server. > If I do dpkg-reconfigure -plow locales > Generating locales... > fr_BE.UTF-8... up-to-date > fr_CA.UTF-8... up-to-date > fr_CH.UTF-8... up-to-date > fr_FR.UTF-8... up-to-date > fr_LU.UTF-8... up-to-date > Generation complete. > > Would that be a problem, I would say no but maybe, do I miss a package??? > > > > -- > View this message in context: > http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]