RE: Index multiple languages with multiple analyzers with the same field

Lance Norskog Fri, 28 Sep 2007 19:29:40 -0700

Other people custom-create a separate dynamic field for each language they
want to support.  The spellchecker in Solr 1.2 wants just one field to use
as its word source, so this fits.

We have a more complex version of this problem: we have content with both
English and other languages. Searching is one problem; we also want to have
spelling correction dictionaries for each language. We have many world
languages which need very different handling and semantics, like CJK
processing. We will have to use the multiple-field trick; I don't think we
can shoehorn our complexity into this technique. It is a valiant effort,
though.

It's possible we could separate out the different-language words in the
document, put them each in separate words_en_text, word_sp_text, etc. and
make the default search field out of 
        <copyField source="*_text" dest="defaultText"/>
Hmm.....

Lance

-----Original Message-----
From: Thom Nelson [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 28, 2007 12:07 PM
To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Index multiple languages with multiple analyzers with the same
field

I had the same problem, but never found a good solution.  The best solution
is to have a more dynamic way of determining which analyzer to return, such
as having some kind of conditional expression evalution in the
fieldType/analyzer element, where either the document or the query request
could be used as the comparison object.

<fieldtype type="textMultiLingual" class="solr.TextField">
    <analyzer type="query" expression="request.lang == 'EN'">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldtype>

Analyzers could still be cached by adding the expression to the cache key.

Unfortunately I have switched jobs, so I don't have the time or motivation
to do this, but it should be a very useful addition.

- Thom

Wu, Daniel wrote:
> Hi,
>  
> I know this probably has been asked before, but I was not able to find 
> it in the mailing list.  So forgive me if I repeated the same question.
>  
> We are trying to build a search application to support multiple 
> languages.  Users can potentially query with any language.  First 
> thought come to us is to index the text of all languages in the same 
> field using language specific analyzer.  As all the data are indexed 
> in the same field, it would just find results with the language that 
> matches the user query.
>  
> Looking at the Solr schema, it seems each field can have one and only 
> analyzer.  Is it possible to have multiple analyzers for the same field?
>  
> Or is there any other approaches that can achieve the same thing?
>  
> Daniel
>
>

RE: Index multiple languages with multiple analyzers with the same field

Reply via email to