Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy.
Regards, Alex. ---- Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran <rishi.easwa...@aol.com> wrote: > Hi All, > > For our use case we don't really need to do a lot of manipulation of incoming > text during index time. At most removal of common stop words, tokenize > emails/ filenames etc if possible. We get text documents from our end users, > which can be in any language (sometimes combination) and we cannot determine > the language of the incoming text. Language detection at index time is not > necessary. > > Which analyzer is recommended to achive basic multilingual search capability > for a use case like this. > I have read a bunch of posts about using a combination standardtokenizer or > ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking > for ideas, suggestions, best practices. > > http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 > http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 > https://issues.apache.org/jira/browse/SOLR-6492 > > > Thanks, > Rishi. >