Hi all,

I have a field that contains accentuated char in it, what I whant is to be able to search with ignore accents.
I have set up that field with :
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

In the index the word "économie" is translated to "econom", the accent is removed thanks to the ISOLatin1AccentFilterFactory and the end of the word removent thanks to the SnowballPorterFilterFactory.

When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time.

I have tested with changing the order of the filters (putting the ISOLatin1AccentFilterFactory on top) without any result.

Could anybody help me with that and point me what may be wrong with my shema ?

Reply via email to