Hi all,
I have a field that contains accentuated char in it, what I whant is to
be able to search with ignore accents.
I have set up that field with :
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
In the index the word "économie" is translated to "econom", the accent
is removed thanks to the ISOLatin1AccentFilterFactory and the end of the
word removent thanks to the SnowballPorterFilterFactory.
When I request with title:econ* I can have the correct answers, but if
I request with title:écon* I have no answers.
If I request with title:économ (the exact word of the index) it works,
so there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same
in both index and query time.
I have tested with changing the order of the filters (putting the
ISOLatin1AccentFilterFactory on top) without any result.
Could anybody help me with that and point me what may be wrong with my
shema ?