I am using php  curl to post data to solr

container tomcat
i have uriencoding set to utf8 in tomcats server.xml file

this is how its indexed
....
$header[] = "Content-Type: text/xml; charset=utf-8";
  curl_setopt($ch, CURLOPT_URL,$url);
  curl_setopt( $ch, CURLOPT_HTTPHEADER, $header );
  curl_setopt($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_POSTFIELDS,$post_string);
.$data = curl_exec($ch);
......
however the document i am sending does not seem to have the utf8 encoding

regards

On 2/18/09, Erik Hatcher <e...@ehatchersolutions.com> wrote:
>
>
> On Feb 18, 2009, at 7:34 AM, revathy arun wrote:
>
>> I am trying to index various langauge documents (foroyo,chinese,japanese)
>> .These have been converted from pdf to text using xpdf
>> I am using the standard anlyzer for content analysis ,but i am not able to
>> search anything from some of the files.
>>
>
> Please provide us an example of how you are indexing... what requests are
> you sending to Solr?  What client API are you using to interface with Solr?
>
> What container are you using?  Jetty?  Tomcat?
>
> My guess is that these documents are not in utf-8 encoding and hence solr
>> does not return result.
>>
>
> Certainly whatever reads in the text from your data source needs to know
> the encoding and use it appropriately.
>
> Is there any way to check the encoding of a text/pdf document or convert
>> them to utf -8 encoding?
>>
>
> I would imagine the conversion could be made to go to UTF8
>
> while indexing i am sending the header for charset as utf-8 .
>>
>
> How are you doing this?
>
> Any pointers?
>>
>
> If you're using Tomcat, you'll need to set the URIEncoding, as described
> here:
>
>  <
> http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4
> >
>
>        Erik
>
>
>

Reply via email to