On Thu, 2007-09-20 at 10:11 +0200, Thierry Collogne wrote:
> Hello,
>
> We are experiencing some strange behavior while searching with words
> containing accents.
> We are using two examples "rené" and "matthé"
>
> When we search for "rené" or for "rene", we get the same results, so that is
> ok.
> But when we search for "matthé" or for "matthe", we get two totally
> different results.
>
> Can someone tell me why this happens? We would like the results to be the
> same.
That highly depends on your schema. Do you use <filter
class="solr.ISOLatin1AccentFilterFactory"/>?
I am using the following an it works like a charm
<fieldType name="stringSimilar" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</analyzer>
<analyzer type="query">
<!--<tokenizer class="solr.LowerCaseTokenizerFactory"/>-->
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</analyzer>
</fieldType>
HTH
salu2
--
Thorsten Scherler thorsten.at.apache.org
Open Source Java consulting, training and solutions