Hi Jerome, I tried to chat with you but you wasn't there or ...?? lol on your website.
Ok I tried what you did and my file bring me back in gedit : <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int><lst name="params"><str name="q">ALL</str></lst></lst><result name="response" numFound="3" start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str name="description_ja">All Japan Women's Pro-wrestling <br /><br />WWWA Champion Title Match <br /><br />è±ç”°çœŸå¥ˆç¾Ž VS 井上京å <br /><br /></str><int name="id">813343</int><str name="language">JA</str><int name="rating_binrate">40</int><arr name="spell"><str>Toyota Manami VS Inoue Kyoko</str></arr><int name="stat_views">1422</int>.... and just that in open office : <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int><lst name="params"><str name="q">ALL</str></lst></lst><result name="response" numFound="3" start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str name="description_ja">All Japan Women's Pro-wrestling :( don't know! Jérôme Etévé wrote: > > Looks like you have a double encoding problem. > > It might be because you fetch UTF-8 binary data from mysql (I know > that for instance the perl driver has an issue with that) and you then > encode it a second time in UTF-8 when you post to solr. > > Make sure the string you're getting from mysql are actually proper > unicode strings and not the raw UTF-8 encoded binary form. > > You may want to have a look at > http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html > for the proper option to use with your connection. > > What you can try to check you're posting actual UTF-8 data to solr is > to dump your xml post in a file (don't forget to set the input > encoding to UTF-8 ). Then you can check if this file is readable with > any UTF-8 aware editor. > > Cheers, > > Jerome. > > > On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> I've solr 1.3 and tomcat55. >> When I try to index a bit of data and I request ALL, obviously my accent >> and >> UTF8 encoding is not took in consideration. >> <doc> >> <date name="created">2006-12-14T15:28:27Z</date> >> <str name="description_ja"> >> Le 1er film de Goro Miyazaki (fils de Hayao) >> <br />je suis allÃ(c)e ... >> .... >> <str name="title_ja">渡邊 å‰ å· vs 三ç"°ä¸‹ç"° 1</str> >> >> >> My database Mysql is well in UTF8, if I request data manually from mysql >> I >> will get accent even japan characters properly >> >> I index my data, my data-config is : >> <dataSource type="JdbcDataSource" >> driver="com.mysql.jdbc.Driver" >> url="jdbc:mysql://master-spare.videos.com/videos" >> user="solr" >> password="pass" >> batchSize="-1" >> responseBuffering="adaptive"/> >> >> My schema config file start by : <?xml version="1.0" encoding="UTF-8" ?> >> >> I've add in my server.xml : because my localhost point on 8180 >> <Connector port="8180" maxHttpHeaderSize="8192" >> maxThreads="150" minSpareThreads="25" maxSpareThreads="75" >> enableLookups="false" redirectPort="8443" acceptCount="100" >> connectionTimeout="20000" disableUploadTimeout="true" >> URIEncoding="UTF-8" useBodyEncodingForURI="true" /> >> >> What can I check? >> I'm using a linux server. >> If I do dpkg-reconfigure -plow locales >> Generating locales... >> fr_BE.UTF-8... up-to-date >> fr_CA.UTF-8... up-to-date >> fr_CH.UTF-8... up-to-date >> fr_FR.UTF-8... up-to-date >> fr_LU.UTF-8... up-to-date >> Generation complete. >> >> Would that be a problem, I would say no but maybe, do I miss a package??? >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Jerome Eteve. > > Chat with me live at http://www.eteve.net > > [EMAIL PROTECTED] > > -- View this message in context: http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20088857.html Sent from the Solr - User mailing list archive at Nabble.com.