Hi, 

wanted to share problem i have got with importing text from different
languages. All international text looks wrong on luke and on AJAX solr. 


What I see for chinese and japanese characters is this:
æ˜ ç”»ã‚„éŸ³æ¥½ãŒæ¥½ã—ã„ï¼AIのサイモンのファンです。アダãƒ
やマットが好きです。LeeDeWyze優勝!I

Although it should be:
映画や音楽が楽しい!AIのサイモンのファンです。アダムやマットが好きです。

My setup is Ubuntu server 10.04, Tomcat6, Solr 1.4 and mysql. 

Things i have configured but with no luck:
 1. /etc/tomcat6/server.xml contains this
<Connector port="8080" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               URIEncoding="UTF-8"
               redirectPort="8443" />
 2. /etc/mysql/my.cnf contains:
 [mysqld]
  .... 
 default-character-set = utf8
  character-set-server = utf8
  
 3. /etc/solr/conf/data-config.xml 
 <dataConfig>
  <dataSource type="JdbcDataSource" 
              driver="com.mysql.jdbc.Driver"
             
url="jdbc:mysql://localhost:3306/spuvocom_spuvo?characterEncoding=UTF-8" 

               encoding = "UTF-8" />
  <document>
 4. my mysql table collation is utf8_bin        

What would you recommend changing or checking?

Thanks in advance 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Failing-to-successfully-import-international-characters-via-DIH-tp1753190p1753190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to