Thanks for the explanation Mitch. You're right. There can't be universal stemmers.
What about multi-language stemmers? I'm mostly interested in English, Spanish, German, French, Italian. Are there any stemmers that would handle those languages? If not, what's the recommended way to deal with documents in multiple languages? --- On Mon, 4/19/10, MitchK <mitc...@web.de> wrote: > From: MitchK <mitc...@web.de> > Subject: Re: LucidWorks Solr > To: solr-user@lucene.apache.org > Date: Monday, April 19, 2010, 4:36 AM > > Andy, I think it is important to know what a stemmer really > is. > > It reduces words to their infinitves. Those infinitives do > not refer to the > real infinitive everytime, but however: for the system, it > is an infinitive, > since all its derivates could be reduced to the same form. > Thats a stemmer. > > According to this, there can't exist a stemmer for every > language, because > every language has got its own rules of how to reduce a > word to its > infinitive. > > If you apply a stemmer for english language on a german > document, the > results might be unexpected. However, sometimes it still > works good enough. > > Keep in mind that this is an algorithm. It is not important > whether the > created infinitive is the real infinitive. It is only > important that most of > the derivate forms can be reduced to the same basic form. > Please ask, if > something is not clear. > > KStem: > The wiki[1] says that KStem is less aggressive as the > standard stemmer. > I guess that this means that there are more rules for how > to reduce a word > to its infinitive and according to this the results might > be better. > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > Kind regards > - Mitch > -- > View this message in context: > http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html > Sent from the Solr - User mailing list archive at > Nabble.com. >