Re: Basic Multilingual search capability

Alexandre Rafalovitch Mon, 23 Feb 2015 14:50:24 -0800

Which languages are you expecting to deal with? Multilingual support
is a complex issue. Even if you think you don't need much, it is
usually a lot more complex than expected, especially around relevancy.


Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 February 2015 at 16:19, Rishi Easwaran <rishi.easwa...@aol.com> wrote:
> Hi All,
>
> For our use case we don't really need to do a lot of manipulation of incoming 
> text during index time. At most removal of common stop words, tokenize 
> emails/ filenames etc if possible. We get text documents from our end users, 
> which can be in any language (sometimes combination) and we cannot determine 
> the language of the incoming text. Language detection at index time is not 
> necessary.
>
> Which analyzer is recommended to achive basic multilingual search capability 
> for a use case like this.
> I have read a bunch of posts about using a combination standardtokenizer or 
> ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking 
> for ideas, suggestions, best practices.
>
> http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
> http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
> https://issues.apache.org/jira/browse/SOLR-6492
>
>
> Thanks,
> Rishi.
>

Re: Basic Multilingual search capability

Reply via email to