Re: Iso accents and wildcards

Nicolas Leconte Sun, 01 Nov 2009 23:27:03 -0800

Tks for the tips, I will try to do exactly what u suggest.


Avlesh Singh a écrit :

When I request with title:econ* I can have the correct  answers, but if  I
request  with  title:écon*  I  have no  answers.
If I request with title:économ (the exact word of the index) it works, so
there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same in
both index and query time.

Wildcard queries are not analyzed and hence the "inconsistent" behaviour.
The easiest way out is to define one more field "title_orginal" as an
untokenized field. While querying, you can use both the fields at the same
time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get
desired matches.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte <nicolas.ai...@aidel.com>wrote:

Hi all,

I have a field that contains accentuated char in it, what I whant is to be
able to search with ignore accents.
I have set up that field with :
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

In the index the word "économie" is translated to  "econom", the  accent is
removed thanks to the ISOLatin1AccentFilterFactory and the end of the word
removent thanks to the SnowballPorterFilterFactory.

When I request with title:econ* I can have the correct  answers, but if  I
request  with  title:écon*  I  have no  answers.
If I request with title:économ (the exact word of the index) it works, so
there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same in
both index and query time.

I have tested with changing the order of the filters (putting the
ISOLatin1AccentFilterFactory on top) without any result.

Could anybody help me with that and point me what may be wrong with my
shema ?

Re: Iso accents and wildcards

Reply via email to