Iso accents and wildcards
Hi all, I have a field that contains accentuated char in it, what I whant is to be able to search with ignore accents. I have set up that field with : generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> words="stopwords.txt" /> In the index the word "économie" is translated to "econom", the accent is removed thanks to the ISOLatin1AccentFilterFactory and the end of the word removent thanks to the SnowballPorterFilterFactory. When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. I have tested with changing the order of the filters (putting the ISOLatin1AccentFilterFactory on top) without any result. Could anybody help me with that and point me what may be wrong with my shema ?
Re: Iso accents and wildcards
Tks for the explain now I can clearly understand why it doesn't work as I was expecting :) jfmel...@free.fr a écrit : if the request contains any wilcard then filters are not called : no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory ! "économie" is indexed to "econom" solr don't found : - term starts with "éco" (éco*) - term starts with "economi" (economi*) if you index manger, mangé and mangue, the indexed terms will be mang and mangu requests -> results manger -> mange, mangé mangé-> mange, mangé mang -> mange, manger mangu-> mangue mang*-> manger, mangé, mangue mang?-> mangue (and not mangé) mangé* -> nothing Jean-François - "Nicolas Leconte" a écrit : | Hi all, | | I have a field that contains accentuated char in it, what I whant is | to | be able to search with ignore accents. | I have set up that field with : | | | | | | generateNumberParts="1" catenateWords="1" catenateNumbers="1" | catenateAll="0" splitOnCaseChange="1" /> | | | words="stopwords.txt" /> | | | | | | | In the index the word "économie" is translated to "econom", the | accent | is removed thanks to the ISOLatin1AccentFilterFactory and the end of | the | word removent thanks to the SnowballPorterFilterFactory. | | When I request with title:econ* I can have the correct answers, but | if | I request with title:écon* I have no answers. | If I request with title:économ (the exact word of the index) it works, | | so there might be something wrong with the wildcard. | As far as I can understand the analyser should be use exactly the same | | in both index and query time. | | I have tested with changing the order of the filters (putting the | ISOLatin1AccentFilterFactory on top) without any result. | | Could anybody help me with that and point me what may be wrong with my | | shema ?
Re: Iso accents and wildcards
Tks for the tips, I will try to do exactly what u suggest. Avlesh Singh a écrit : When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. Wildcard queries are not analyzed and hence the "inconsistent" behaviour. The easiest way out is to define one more field "title_orginal" as an untokenized field. While querying, you can use both the fields at the same time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get desired matches. Cheers Avlesh On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte wrote: Hi all, I have a field that contains accentuated char in it, what I whant is to be able to search with ignore accents. I have set up that field with : In the index the word "économie" is translated to "econom", the accent is removed thanks to the ISOLatin1AccentFilterFactory and the end of the word removent thanks to the SnowballPorterFilterFactory. When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. I have tested with changing the order of the filters (putting the ISOLatin1AccentFilterFactory on top) without any result. Could anybody help me with that and point me what may be wrong with my shema ?