Hi , I am trying to index various langauge documents (foroyo,chinese,japanese) .These have been converted from pdf to text using xpdf I am using the standard anlyzer for content analysis ,but i am not able to search anything from some of the files.
My guess is that these documents are not in utf-8 encoding and hence solr does not return result. Is there any way to check the encoding of a text/pdf document or convert them to utf -8 encoding? while indexing i am sending the header for charset as utf-8 . Any pointers? Thanks