Re: Add 2 stemmers to a textfield?

Daniel Alheiros Tue, 10 Jul 2007 03:37:53 -0700

Hi Thierry.

I'm not sure this is the best approach. What I've adopted an so far is
working really well is to have one field per language (like text_french and
text_dutch) and in your schema you declare both plus one that just receives
the copy of them.


Your index/query analysis have to be compatible or else it's not possible to
match results. Take a look at the Lucene documentation ("Lucene in Action"
is a good book and talks about that).

schema:
    <field name="content_french" type="text_french" indexed="true"
stored="false" />
    <field name="content_dutch" type="text_dutch" indexed="true"
stored="false" />

    <field name="content" type="text" indexed="false" stored="true" />


    <copyField source="content_french" dest="content"/>
    <copyField source="content_dutch"  dest="content"/>


And in the Solr config you can create dismax request handlers to handle each
language defining boost in a language relative way.


Regards,
Daniel


On 10/7/07 07:38, "Thierry Collogne" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Our index contains 2 languages : dutch and french. I was wondering if it is
> possible to add 2 solr.SnowballPorterFilterFactory filters to one text field
> like this :
> 
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>   <analyzer type="index">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt
> "/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory" language="Dutch" />
>     <filter class="solr.SnowballPorterFilterFactory" language="French" />
>     <filter class="solr.ISOLatin1AccentFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt
> "/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory" language="Dutch" />
>     <filter class="solr.SnowballPorterFilterFactory" language="French" />
>     <filter class="solr.ISOLatin1AccentFilterFactory"/>
>   </analyzer>
> </fieldType>
> 
> 
> Also can someone explain to me, why sometimes a filter is used at index time
> and sometimes at query time. It is not entirely clear to me what the
> difference is.
> 
> Thank you,
> 
> Thierry


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Add 2 stemmers to a textfield?

Reply via email to