Re: Does solr supports indexing of files other than UTF-8

prasad deshpande Fri, 28 Jan 2011 00:42:06 -0800

Thanks paul.

However I want to support local encoding files to be indexed. How would I
achieve it?


On Thu, Jan 27, 2011 at 2:46 PM, Paul Libbrecht <p...@hoplahup.net> wrote:

> At least in java utf-8 transcoding is done on a stream basis. No issue
> there.
>
> paul
>
>
> Le 27 janv. 2011 à 09:51, prasad deshpande a écrit :
>
> > The size of docs can be huge, like suppose there are 800MB pdf file to
> index
> > it I need to translate it in UTF-8 and then send this file to index. Now
> > suppose there can be any number of clients who can upload file. at that
> time
> > it will affect performance. and already our product support localization
> > with local encoding.
> >
> > Thanks,
> > Prasad
> >
> > On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net>
> wrote:
> >
> >> Why is converting documents to utf-8 not feasible?
> >> Nowadays any platform offers such services.
> >>
> >> Can you give a detailed failure description (maybe with the URL to a
> sample
> >> document you post)?
> >>
> >> paul
> >>
> >>
> >> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit :
> >>> I am able to successfully index/search non-Engilsh data(like Hebrew,
> >>> Japnese) which was encoded in UTF-8.
> >>> However, When I tried to index data which was encoded in local encoding
> >> like
> >>> Big5 for Japanese I could not see the desired results.
> >>> The contents after indexing looked garbled for Big5 encoded document
> when
> >> I
> >>> searched for all indexed documents.
> >>>
> >>> Converting a complete document in UTF-8 is not feasible.
> >>> I am not very clear about how Solr support these localizations with
> other
> >>> than UTF-8 encoding.
> >>>
> >>>
> >>> I verified below links
> >>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html
> >>> 2.  http://wiki.apache.org/solr/LanguageAnalysis
> >>>
> >>> Thanks and Regards,
> >>> Prasad
> >>
> >>
>
>

Re: Does solr supports indexing of files other than UTF-8

Reply via email to