Re: Basic Multilingual search capability

2015-02-26 Thread Rishi Easwaran
. Just one clarification, when you say ICUFilterFactory am I correct in thinking its ICUFodingFilterFactory. Thanks, Rishi. -Original Message- From: Tom Burton-West To: solr-user Sent: Wed, Feb 25, 2015 4:33 pm Subject: Re: Basic Multilingual search capability Hi Rishi, As

Re: Basic Multilingual search capability

2015-02-25 Thread Tom Burton-West
Hi Alex, > > Thanks for the suggestions. These steps will definitely help out with our > use case. > Thanks for the idea about the lengthFilter to protect our system. > > Thanks, > Rishi. > > > > > > > > -Original Message- > From: Alexandr

Re: Basic Multilingual search capability

2015-02-25 Thread Rishi Easwaran
: Re: Basic Multilingual search capability Given the limited needs, I would probably do something like this: 1) Put a language identifier in the UpdateRequestProcessor chain during indexing and route out at least known problematic languages, such as Chinese, Japanese, Arabic into individual fields

Re: Basic Multilingual search capability

2015-02-25 Thread Rishi Easwaran
forward to trying out once its integrated to main. Thanks, Rishi. -Original Message- From: Trey Grainger To: solr-user Sent: Tue, Feb 24, 2015 1:40 am Subject: Re: Basic Multilingual search capability Hi Rishi, I don't generally recommend a language-insensitive approach excep

Re: Basic Multilingual search capability

2015-02-24 Thread Alexandre Rafalovitch
Given the limited needs, I would probably do something like this: 1) Put a language identifier in the UpdateRequestProcessor chain during indexing and route out at least known problematic languages, such as Chinese, Japanese, Arabic into individual fields 2) Put everything else together into one f

Re: Basic Multilingual search capability

2015-02-23 Thread Trey Grainger
t if it had capability to tokenize email addresses > (ex:he...@aol.com- i think standardTokenizer already does this), > filenames (здравствуйте.pdf), but maybe we can use filters to accomplish > that. > > > > Thanks, > > Rishi. > > > > -Original Message- &

Re: Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
solr-user Sent: Mon, Feb 23, 2015 11:17 pm Subject: Re: Basic Multilingual search capability It isn’t just complicated, it can be impossible. Do you have content in Chinese or Japanese? Those languages (and some others) do not separate words with spaces. You cannot even do word search without a

Re: Basic Multilingual search capability

2015-02-23 Thread Walter Underwood
> > Thanks, > Rishi. > > -Original Message- > From: Alexandre Rafalovitch > To: solr-user > Sent: Mon, Feb 23, 2015 5:49 pm > Subject: Re: Basic Multilingual search capability > > > Which languages are you expecting to deal with? Multilingual suppor

Re: Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my

Re: Basic Multilingual search capability

2015-02-23 Thread Alexandre Rafalovitch
xt documents from our end users, > which can be in any language (sometimes combination) and we cannot determine > the language of the incoming text. Language detection at index time is not > necessary. > > Which analyzer is recommended to achive basic multilingual search capability >

Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
nd we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtoke