It actually come from the database mysql's variable : | character_set_client | latin1 | | character_set_connection | latin1 |
so I don't know really now how to configure my datasource to point in latin1 and not utf8. sunnyfr wrote: > > Hi Jerome, > > I tried to chat with you but you wasn't there or ...?? lol on your > website. > > Ok I tried what you did and my file bring me back in gedit : > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">0</int><int > name="QTime">0</int><lst name="params"><str > name="q">ALL</str></lst></lst><result name="response" numFound="3" > start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str > name="description_ja">All Japan Women's Pro-wrestling > <br /><br />WWWA Champion Title Match > <br /><br />è±ç”°çœŸå¥ˆç¾Ž VS 井上京å > <br /><br /></str><int name="id">813343</int><str > name="language">JA</str><int name="rating_binrate">40</int><arr > name="spell"><str>Toyota Manami VS Inoue Kyoko</str></arr><int > name="stat_views">1422</int>.... > > and just that in open office : > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">0</int><int > name="QTime">0</int><lst name="params"><str > name="q">ALL</str></lst></lst><result name="response" numFound="3" > start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str > name="description_ja">All Japan Women's Pro-wrestling > > :( don't know! > > > Jérôme Etévé wrote: >> >> Looks like you have a double encoding problem. >> >> It might be because you fetch UTF-8 binary data from mysql (I know >> that for instance the perl driver has an issue with that) and you then >> encode it a second time in UTF-8 when you post to solr. >> >> Make sure the string you're getting from mysql are actually proper >> unicode strings and not the raw UTF-8 encoded binary form. >> >> You may want to have a look at >> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html >> for the proper option to use with your connection. >> >> What you can try to check you're posting actual UTF-8 data to solr is >> to dump your xml post in a file (don't forget to set the input >> encoding to UTF-8 ). Then you can check if this file is readable with >> any UTF-8 aware editor. >> >> Cheers, >> >> Jerome. >> >> >> On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote: >>> >>> Hi, >>> >>> I've solr 1.3 and tomcat55. >>> When I try to index a bit of data and I request ALL, obviously my accent >>> and >>> UTF8 encoding is not took in consideration. >>> <doc> >>> <date name="created">2006-12-14T15:28:27Z</date> >>> <str name="description_ja"> >>> Le 1er film de Goro Miyazaki (fils de Hayao) >>> <br />je suis allÃ(c)e ... >>> .... >>> <str name="title_ja">渡邊 å‰ å· vs 三ç"°ä¸‹ç"° 1</str> >>> >>> >>> My database Mysql is well in UTF8, if I request data manually from mysql >>> I >>> will get accent even japan characters properly >>> >>> I index my data, my data-config is : >>> <dataSource type="JdbcDataSource" >>> driver="com.mysql.jdbc.Driver" >>> url="jdbc:mysql://master-spare.videos.com/videos" >>> user="solr" >>> password="pass" >>> batchSize="-1" >>> responseBuffering="adaptive"/> >>> >>> My schema config file start by : <?xml version="1.0" encoding="UTF-8" ?> >>> >>> I've add in my server.xml : because my localhost point on 8180 >>> <Connector port="8180" maxHttpHeaderSize="8192" >>> maxThreads="150" minSpareThreads="25" maxSpareThreads="75" >>> enableLookups="false" redirectPort="8443" >>> acceptCount="100" >>> connectionTimeout="20000" disableUploadTimeout="true" >>> URIEncoding="UTF-8" useBodyEncodingForURI="true" /> >>> >>> What can I check? >>> I'm using a linux server. >>> If I do dpkg-reconfigure -plow locales >>> Generating locales... >>> fr_BE.UTF-8... up-to-date >>> fr_CA.UTF-8... up-to-date >>> fr_CH.UTF-8... up-to-date >>> fr_FR.UTF-8... up-to-date >>> fr_LU.UTF-8... up-to-date >>> Generation complete. >>> >>> Would that be a problem, I would say no but maybe, do I miss a >>> package??? >>> >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20090130.html Sent from the Solr - User mailing list archive at Nabble.com.