Thanks paul. However I want to support local encoding files to be indexed. How would I achieve it?
On Thu, Jan 27, 2011 at 2:46 PM, Paul Libbrecht <p...@hoplahup.net> wrote: > At least in java utf-8 transcoding is done on a stream basis. No issue > there. > > paul > > > Le 27 janv. 2011 à 09:51, prasad deshpande a écrit : > > > The size of docs can be huge, like suppose there are 800MB pdf file to > index > > it I need to translate it in UTF-8 and then send this file to index. Now > > suppose there can be any number of clients who can upload file. at that > time > > it will affect performance. and already our product support localization > > with local encoding. > > > > Thanks, > > Prasad > > > > On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net> > wrote: > > > >> Why is converting documents to utf-8 not feasible? > >> Nowadays any platform offers such services. > >> > >> Can you give a detailed failure description (maybe with the URL to a > sample > >> document you post)? > >> > >> paul > >> > >> > >> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : > >>> I am able to successfully index/search non-Engilsh data(like Hebrew, > >>> Japnese) which was encoded in UTF-8. > >>> However, When I tried to index data which was encoded in local encoding > >> like > >>> Big5 for Japanese I could not see the desired results. > >>> The contents after indexing looked garbled for Big5 encoded document > when > >> I > >>> searched for all indexed documents. > >>> > >>> Converting a complete document in UTF-8 is not feasible. > >>> I am not very clear about how Solr support these localizations with > other > >>> than UTF-8 encoding. > >>> > >>> > >>> I verified below links > >>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html > >>> 2. http://wiki.apache.org/solr/LanguageAnalysis > >>> > >>> Thanks and Regards, > >>> Prasad > >> > >> > >