*ASCIIFoldingFilter *is probably the filter known to replace the assented chars to normal ones. However i don't see that in your config.
For the issue, you can easily debug the issue through solr analysis tool. Regards, Jayendra On Fri, Aug 13, 2010 at 3:20 AM, Andrea Gazzarini < andrea.gazzar...@atcult.it> wrote: > Hi, > I have a problem regarding a diacritic character on my query string : > > *q=intertestualità > * > which is encoded in > > *q=intertestualit%E0 > * > What I'm not understanding is the following query response fragments : > > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">23</int> > <lst name="params"> > <str name="sort">score desc</str> > <str name="fl">score,title</str> > > <str name="debugQuery">on</str> > <str name="indent">on</str> > <str name="start">0</str> > *<str name="q">intertestualit</str>* > <str name="version">2.2</str> > > <str name="rows">3</str> > </lst> > > and > > <lst name="debug"> > <str name="rawquerystring">*intertestualit*</str> > <str name="querystring">*intertestualit*</str> > > I saw that my index contains the token "intertestualita" (with the 'à' char > replaced with 'a'). Indeed if I query for "intertestualita" I found my > results. > The queried field is configured with the same chain : > > <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory" > /> > <filter > class="schema.UnicodeNormalizationFilterFactory" version="icu4j" > composed="false" remove_diacritics="true" remove_modifiers="true" > fold="true" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter > class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="schema.UnicodeNormalizationFilterFactory" > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > </fieldtype> > > So my question is : who is removing the "à" (%E0) characters from the input > query? It seems that the query arrives to SOLR already without that > character... > > Regards, > Andrea > >