Iso accents and wildcards

Nicolas Leconte Fri, 30 Oct 2009 08:46:54 -0700

Hi all,

I have a field that contains accentuated char in it, what I whant is tobe able to search with ignore accents.

I have set up that field with :
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="1" catenateNumbers="1"catenateAll="0" splitOnCaseChange="1" />

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" />

<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

In the index the word "économie" is translated to "econom", the accentis removed thanks to the ISOLatin1AccentFilterFactory and the end of theword removent thanks to the SnowballPorterFilterFactory.

When I request with title:econ* I can have the correct answers, but ifI request with title:écon* I have no answers.If I request with title:économ (the exact word of the index) it works,so there might be something wrong with the wildcard.As far as I can understand the analyser should be use exactly the samein both index and query time.

I have tested with changing the order of the filters (putting theISOLatin1AccentFilterFactory on top) without any result.

Could anybody help me with that and point me what may be wrong with myshema ?

Iso accents and wildcards

Reply via email to