On Thu, 2007-09-20 at 10:11 +0200, Thierry Collogne wrote: > Hello, > > We are experiencing some strange behavior while searching with words > containing accents. > We are using two examples "rené" and "matthé" > > When we search for "rené" or for "rene", we get the same results, so that is > ok. > But when we search for "matthé" or for "matthe", we get two totally > different results. > > Can someone tell me why this happens? We would like the results to be the > same.
That highly depends on your schema. Do you use <filter class="solr.ISOLatin1AccentFilterFactory"/>? I am using the following an it works like a charm <fieldType name="stringSimilar" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <!--<tokenizer class="solr.LowerCaseTokenizerFactory"/>--> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> </fieldType> HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions