Re: Indexing of non-english text with Solr, any known limitations?

Bertrand Delacretaz Wed, 12 Apr 2006 07:58:06 -0700

Hi Yonik,

Thanks very much for your replies!


Le 12 avr. 06 à 16:45, Yonik Seeley a écrit :

On 4/12/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

...The project that I'm looking at is currently single-language
(French), which I assume can be handled by static configuration of
the appropriate analyzers.


Yes, with a little bit of work (making a Solr Filter Factory or
Tokenizer factory) you can use any Lucene filter, tokenizer, or
analyzer.

ok. If my project actually happens I'll do my best to contribute such changes if they make sense to Solr.

...Would you need to index multiple languages in the same field?  That
could be trickier, and it seems like you would need an analyzer that
supported that.

The language switch would be per document, so one document might contain French and another one with the same field structure might contain German.

But the content of a single field would be in one language, I don't see a need for language switches inside the content of a field.

Having mixed languages in a single index obviously leads to some imprecision when searching, as the query needs to indicate which analyzer to use. But it still allows people to search in their own language and, when the query includes common words, find documents in other languages.


-Bertrand

smime.p7s
Description: S/MIME cryptographic signature

Re: Indexing of non-english text with Solr, any known limitations?

Reply via email to