At least in java utf-8 transcoding is done on a stream basis. No issue there.
paul Le 27 janv. 2011 à 09:51, prasad deshpande a écrit : > The size of docs can be huge, like suppose there are 800MB pdf file to index > it I need to translate it in UTF-8 and then send this file to index. Now > suppose there can be any number of clients who can upload file. at that time > it will affect performance. and already our product support localization > with local encoding. > > Thanks, > Prasad > > On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net> wrote: > >> Why is converting documents to utf-8 not feasible? >> Nowadays any platform offers such services. >> >> Can you give a detailed failure description (maybe with the URL to a sample >> document you post)? >> >> paul >> >> >> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : >>> I am able to successfully index/search non-Engilsh data(like Hebrew, >>> Japnese) which was encoded in UTF-8. >>> However, When I tried to index data which was encoded in local encoding >> like >>> Big5 for Japanese I could not see the desired results. >>> The contents after indexing looked garbled for Big5 encoded document when >> I >>> searched for all indexed documents. >>> >>> Converting a complete document in UTF-8 is not feasible. >>> I am not very clear about how Solr support these localizations with other >>> than UTF-8 encoding. >>> >>> >>> I verified below links >>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html >>> 2. http://wiki.apache.org/solr/LanguageAnalysis >>> >>> Thanks and Regards, >>> Prasad >> >>