Re: utf 8 issue

Erik Hatcher Wed, 18 Feb 2009 05:41:33 -0800


On Feb 18, 2009, at 7:34 AM, revathy arun wrote:

I am trying to index various langauge documents(foroyo,chinese,japanese)
.These have been converted from pdf to text using xpdf
I am using the standard anlyzer for content analysis ,but i am notable to
search anything from some of the files.

Please provide us an example of how you are indexing... what requestsare you sending to Solr? What client API are you using to interfacewith Solr?


What container are you using?  Jetty?  Tomcat?

My guess is that these documents are not in utf-8 encoding and hencesolr
does not return result.

Certainly whatever reads in the text from your data source needs toknow the encoding and use it appropriately.

Is there any way to check the encoding of a text/pdf document orconvert
them to utf -8 encoding?


I would imagine the conversion could be made to go to UTF8

while indexing i am sending the header for charset as utf-8 .


How are you doing this?

Any pointers?

If you're using Tomcat, you'll need to set the URIEncoding, asdescribed here:

<http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4>


        Erik

Re: utf 8 issue

Reply via email to