On Thu, 2007-09-20 at 15:27 +0200, Bertrand Delacretaz wrote:
> On 9/20/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> 
> > ...Thank you very much. Moving the <filter class="
> > solr.ISOLatin1AccentFilterFactory"/> up in the chain fixed it....
> 
> Yes, the problem was the EnglishPorterFilterFactory before the accents
> removal: the stemmer doesn't know about accents, so no stemming
> occured on "matthé" whereas "matthe" was stemmed to "matth".
> 
> BTW, your "rené" example makes me think you're indexing french, if
> that's the case you might want to use a stemmer configured for that
> language, for example
> 
> <filter
>   class="Solr.SnowballPorterFilterFactory"
>   language="French"/>

Betrand, does the French Snowball work fine?

A colleague of mine exchanged mails with Porter about the Spanish filter
and he came to the conclusion that it is not really working well for
Spanish:

"So -orio on the whole changes meaning too much (acceso = access,
accessorio = accessory differ as much in Spanish as English; -atorio
similarly (aclarar to  rinse, clear (in a very general sense), brighten
up; aclaratorio = explanatory). 

Diminutives, augmentatives usually fall under (a) and (c). -illo, -ote,
-isimo are in this category. 

-al and -iz look like plausible candidates for ending removal, but,
unlike their English counterparts, removing them makes little difference
or improvement. Similarly with -ion removal after -s. 

There is a difficulty with pure vowel endings, and the stemmer can't
always get this right. So in English 'academic' is stemmed to 'academ'
but 'academy' does not lose the final -y (or -i). This explains the
residual vowels with -io, -ia 
endings etc."

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions

Reply via email to