Re: Strange behavior when searching with accents

Thorsten Scherler Thu, 20 Sep 2007 01:19:18 -0700

On Thu, 2007-09-20 at 10:11 +0200, Thierry Collogne wrote:
> Hello,
> 
> We are experiencing some strange behavior while searching with words
> containing accents.
> We are using two examples "rené" and "matthé"
> 
> When we search for "rené" or for "rene", we get the same results, so that is
> ok.
> But when we search for "matthé" or for "matthe", we get two totally
> different results.
> 
> Can someone tell me why this happens? We would like the results to be the
> same.


That highly depends on your schema. Do you use <filter
class="solr.ISOLatin1AccentFilterFactory"/>?

I am using the following an it works like a charm
<fieldType name="stringSimilar" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
      </analyzer>
      <analyzer type="query">
        <!--<tokenizer class="solr.LowerCaseTokenizerFactory"/>-->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
      </analyzer>
    </fieldType>

HTH

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions

Re: Strange behavior when searching with accents

Reply via email to