Re: Does solr supports indexing of files other than UTF-8

Paul Libbrecht Thu, 27 Jan 2011 01:17:30 -0800

At least in java utf-8 transcoding is done on a stream basis. No issue there.


paul


Le 27 janv. 2011 à 09:51, prasad deshpande a écrit :

> The size of docs can be huge, like suppose there are 800MB pdf file to index
> it I need to translate it in UTF-8 and then send this file to index. Now
> suppose there can be any number of clients who can upload file. at that time
> it will affect performance. and already our product support localization
> with local encoding.
> 
> Thanks,
> Prasad
> 
> On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net> wrote:
> 
>> Why is converting documents to utf-8 not feasible?
>> Nowadays any platform offers such services.
>> 
>> Can you give a detailed failure description (maybe with the URL to a sample
>> document you post)?
>> 
>> paul
>> 
>> 
>> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit :
>>> I am able to successfully index/search non-Engilsh data(like Hebrew,
>>> Japnese) which was encoded in UTF-8.
>>> However, When I tried to index data which was encoded in local encoding
>> like
>>> Big5 for Japanese I could not see the desired results.
>>> The contents after indexing looked garbled for Big5 encoded document when
>> I
>>> searched for all indexed documents.
>>> 
>>> Converting a complete document in UTF-8 is not feasible.
>>> I am not very clear about how Solr support these localizations with other
>>> than UTF-8 encoding.
>>> 
>>> 
>>> I verified below links
>>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html
>>> 2.  http://wiki.apache.org/solr/LanguageAnalysis
>>> 
>>> Thanks and Regards,
>>> Prasad
>> 
>>

Re: Does solr supports indexing of files other than UTF-8

Reply via email to