I am using php curl to post data to solr container tomcat i have uriencoding set to utf8 in tomcats server.xml file
this is how its indexed .... $header[] = "Content-Type: text/xml; charset=utf-8"; curl_setopt($ch, CURLOPT_URL,$url); curl_setopt( $ch, CURLOPT_HTTPHEADER, $header ); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS,$post_string); .$data = curl_exec($ch); ...... however the document i am sending does not seem to have the utf8 encoding regards On 2/18/09, Erik Hatcher <e...@ehatchersolutions.com> wrote: > > > On Feb 18, 2009, at 7:34 AM, revathy arun wrote: > >> I am trying to index various langauge documents (foroyo,chinese,japanese) >> .These have been converted from pdf to text using xpdf >> I am using the standard anlyzer for content analysis ,but i am not able to >> search anything from some of the files. >> > > Please provide us an example of how you are indexing... what requests are > you sending to Solr? What client API are you using to interface with Solr? > > What container are you using? Jetty? Tomcat? > > My guess is that these documents are not in utf-8 encoding and hence solr >> does not return result. >> > > Certainly whatever reads in the text from your data source needs to know > the encoding and use it appropriately. > > Is there any way to check the encoding of a text/pdf document or convert >> them to utf -8 encoding? >> > > I would imagine the conversion could be made to go to UTF8 > > while indexing i am sending the header for charset as utf-8 . >> > > How are you doing this? > > Any pointers? >> > > If you're using Tomcat, you'll need to set the URIEncoding, as described > here: > > < > http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4 > > > > Erik > > >