Re: Is there a way to force content extraction with a given encoding

Jörn Franke Thu, 07 Nov 2019 23:08:01 -0800

I would convert them to UTF-8 before posting and use UTF-8 in your application. 
Most of the web and applications use UTF-8. If you use other encodings you will 
always run into problems.


> Am 08.11.2019 um 07:47 schrieb lala <labisha...@gmail.com>:
> 
> I am using the /update/extract request handler to push documents into solr,
> but some text documents, that are encoded as windows-1255 (arabic texts) are
> not extracted properly, the text given is not readable.
> 
> I searched in the web, and solr documentation and found nothing. I need to
> send the file encoding as a parameter if possible to let the tika parser get
> to know it.
> 
> Is there a way to achieve that?
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Is there a way to force content extraction with a given encoding

Reply via email to