It actually come from the database mysql's variable :
| character_set_client                | latin1                                  
             
| 
| character_set_connection        | latin1                                      
         
| 

so I don't know really now how to configure my datasource to point in latin1
and not utf8.


sunnyfr wrote:
> 
> Hi Jerome,
> 
> I tried to chat with you but you wasn't there or ...?? lol on your
> website.
> 
> Ok I tried what you did and my file bring me back in gedit :
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">0</int><lst name="params"><str
> name="q">ALL</str></lst></lst><result name="response" numFound="3"
> start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str
> name="description_ja">All Japan Women's Pro-wrestling
> &lt;br /&gt;&lt;br /&gt;WWWA Champion Title Match
> &lt;br /&gt;&lt;br /&gt;豐田真奈美 VS 井上京子
> &lt;br /&gt;&lt;br /&gt;</str><int name="id">813343</int><str
> name="language">JA</str><int name="rating_binrate">40</int><arr
> name="spell"><str>Toyota Manami VS Inoue Kyoko</str></arr><int
> name="stat_views">1422</int>....
> 
> and just that in open office :
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">0</int><lst name="params"><str
> name="q">ALL</str></lst></lst><result name="response" numFound="3"
> start="0"><doc><date name="created">2006-10-10T05:29:32Z</date><str
> name="description_ja">All Japan Women's Pro-wrestling
> 
> :( don't know!
> 
> 
> Jérôme Etévé wrote:
>> 
>> Looks like you have a double encoding problem.
>> 
>> It might be because you fetch UTF-8 binary data from mysql (I know
>> that for instance the perl driver has an issue with that) and you then
>> encode it a second time in UTF-8 when you post to solr.
>> 
>> Make sure the string you're getting from mysql are actually proper
>> unicode strings and not the raw UTF-8 encoded binary form.
>> 
>> You may want to have a look at
>> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html
>> for the proper option to use with your connection.
>> 
>> What you can try to check you're posting actual UTF-8 data to solr is
>> to dump your xml post in a file (don't forget to set the input
>> encoding to UTF-8 ). Then you can check if this file is readable with
>> any UTF-8 aware editor.
>> 
>> Cheers,
>> 
>> Jerome.
>> 
>> 
>> On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I've solr 1.3 and tomcat55.
>>> When I try to index a bit of data and I request ALL, obviously my accent
>>> and
>>> UTF8 encoding is not took in consideration.
>>> <doc>
>>> <date name="created">2006-12-14T15:28:27Z</date>
>>> <str name="description_ja">
>>> Le 1er film de Goro Miyazaki (fils de Hayao)
>>> <br />je suis allÃ(c)e  ...
>>> ....
>>> <str name="title_ja">渡邊 å‰ å·  vs 三ç"°ä¸‹ç"° 1</str>
>>>
>>>
>>> My database Mysql is well in UTF8, if I request data manually from mysql
>>> I
>>> will get accent even japan characters properly
>>>
>>> I index my data, my data-config is :
>>>  <dataSource type="JdbcDataSource"
>>>              driver="com.mysql.jdbc.Driver"
>>>              url="jdbc:mysql://master-spare.videos.com/videos"
>>>              user="solr"
>>>              password="pass"
>>>              batchSize="-1"
>>>              responseBuffering="adaptive"/>
>>>
>>> My schema config file start by : <?xml version="1.0" encoding="UTF-8" ?>
>>>
>>> I've add in my server.xml : because my localhost point on 8180
>>>    <Connector port="8180" maxHttpHeaderSize="8192"
>>>               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
>>>               enableLookups="false" redirectPort="8443"
>>> acceptCount="100"
>>>               connectionTimeout="20000" disableUploadTimeout="true"
>>> URIEncoding="UTF-8" useBodyEncodingForURI="true" />
>>>
>>> What can I check?
>>> I'm using a linux server.
>>> If I do dpkg-reconfigure -plow locales
>>> Generating locales...
>>>  fr_BE.UTF-8... up-to-date
>>>  fr_CA.UTF-8... up-to-date
>>>  fr_CH.UTF-8... up-to-date
>>>  fr_FR.UTF-8... up-to-date
>>>  fr_LU.UTF-8... up-to-date
>>> Generation complete.
>>>
>>> Would that be a problem, I would say no but maybe, do I miss a
>>> package???
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> 
>> -- 
>> Jerome Eteve.
>> 
>> Chat with me live at http://www.eteve.net
>> 
>> [EMAIL PROTECTED]
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20090130.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to