Also, your browser may use a platform default for the encoding instead of 
UTF-8. Some MacOS and Windows browsers have this problem.
Tomcat sometimes needs adjustment to use UTF-8. If you are on tomcat, check 
this:
http://find.searchhub.org/link?url=http://wiki.apache.org/solr/SolrTomcat
http://find.searchhub.org/?q=utf-8#%2Fp%3Asolr%2Fs%3Alucid%2Cwiki

----- Original Message -----
| From: "Gora Mohanty" <g...@mimirtech.com>
| To: solr-user@lucene.apache.org
| Sent: Thursday, September 6, 2012 7:13:40 PM
| Subject: Re: Importing of unix date format from mysql database and dates of 
format 'Thu, 06 Sep 2012 22:32:33 +0000'
| in Solr 4.0
| 
| On 7 September 2012 06:24, kiran chitturi <chitturikira...@gmail.com>
| wrote:
| [...]
| 
| > When i index a text field which has arabic and English like this
| > tweet
| > “@anaga3an: هو سعد الحريري بيعمل ايه غير تحديد الدوجلاس ويختار
| > الكرافته ؟؟”
| > #gcc #ksa #lebanon #syria #kuwait #egypt #سوريا
| > with field_type as 'text_ar' and when i try to see the same field
| > again in
| > solr, it is shown as below.
| > RT @AhmedWagih: لو معملناش حاجة �ي الزيادة
| > السكانية �ي مصر، هنتحول لدولة �قيرة
| > كثي�ة السكان زي بنجلادش #Egypt #EgyEconomy
| >
| > both of the lines do not mean the same, but i have just placed them
| > here as
| > an example. This was the problem i am facing.
| >
| [...]
| 
| The encoding of your input text is being mangled at some point.
| Presuming that your original encoding is UTF-8, I would look at
| how you are indexing into Solr, and the encoding settings on the
| Java container. Solr itself handles UTF-8 perfectly fine, as do
| most Java containers if configured properly, so my first suspicion
| would be the indexing code.
| 
| As it looks like you are pulling from mysql using DIH, check that
| the database character set is UTF-8, and that the connection uses
| UTF-8.
| 
| Regards,
| Gora
| 

Reply via email to