Re: Strange behavior when searching with accents

Thorsten Scherler Thu, 20 Sep 2007 05:14:38 -0700

On Thu, 2007-09-20 at 13:33 +0200, Thierry Collogne wrote:
> We are using this schema definition
>



Thierry, try to move the solr.ISOLatin1AccentFilterFactory up the filter
cue, like:

...
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
...

for both indexing and query. 

This way you make sure that all accent are gone before you do further
filtering.

You may need to reindex all documents to make sure we are not going to
use the old index.

HTH

salu2

> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory" protected="
> protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.ISOLatin1AccentFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory" protected="
> protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.ISOLatin1AccentFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> I will take a look at the analyzer took.
> 
> Thank you both for the quick response.
> 
> On 20/09/2007, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
> >
> > On 9/20/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> >
> > > ..when we search for "matthé" or for "matthe", we get two totally
> > > different results....
> >
> > The analyzer admin tool should help you find out what's happening, see
> >
> > http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9
> >
> > -Bertrand
> >
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions

Re: Strange behavior when searching with accents

Reply via email to