utf 8 issue

revathy arun Wed, 18 Feb 2009 04:36:07 -0800

Hi ,

I am trying to index various langauge documents (foroyo,chinese,japanese)
.These have been converted from pdf to text using xpdf
I am using the standard anlyzer for content analysis ,but i am not able to
search anything from some of the files.


My guess is that these documents are not in utf-8 encoding and hence solr
does not return result.


Is there any way to check the encoding of a text/pdf document or convert
them to utf -8 encoding?

while indexing i am sending the header for charset as utf-8 .

Any pointers?

Thanks

utf 8 issue

Reply via email to