On Thu, 2007-09-20 at 15:27 +0200, Bertrand Delacretaz wrote: > On 9/20/07, Thierry Collogne <[EMAIL PROTECTED]> wrote: > > > ...Thank you very much. Moving the <filter class=" > > solr.ISOLatin1AccentFilterFactory"/> up in the chain fixed it.... > > Yes, the problem was the EnglishPorterFilterFactory before the accents > removal: the stemmer doesn't know about accents, so no stemming > occured on "matthé" whereas "matthe" was stemmed to "matth". > > BTW, your "rené" example makes me think you're indexing french, if > that's the case you might want to use a stemmer configured for that > language, for example > > <filter > class="Solr.SnowballPorterFilterFactory" > language="French"/>
Betrand, does the French Snowball work fine? A colleague of mine exchanged mails with Porter about the Spanish filter and he came to the conclusion that it is not really working well for Spanish: "So -orio on the whole changes meaning too much (acceso = access, accessorio = accessory differ as much in Spanish as English; -atorio similarly (aclarar to rinse, clear (in a very general sense), brighten up; aclaratorio = explanatory). Diminutives, augmentatives usually fall under (a) and (c). -illo, -ote, -isimo are in this category. -al and -iz look like plausible candidates for ending removal, but, unlike their English counterparts, removing them makes little difference or improvement. Similarly with -ion removal after -s. There is a difficulty with pure vowel endings, and the stemmer can't always get this right. So in English 'academic' is stemmed to 'academ' but 'academy' does not lose the final -y (or -i). This explains the residual vowels with -io, -ia endings etc." salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions