Thanks for the tip. Are there any publicly available dictionary of morphologies that I could use? Or did you build your own one?
--- On Mon, 4/19/10, Darren Govoni <dar...@ontrenet.com> wrote: > From: Darren Govoni <dar...@ontrenet.com> > Subject: Re: LucidWorks Solr > To: solr-user@lucene.apache.org > Date: Monday, April 19, 2010, 7:39 AM > Regarding stemmers, I ditched them > altogether a long time ago in favor > of a dictionary of morphologies of all known words (for any > given > language). A simple lookup of any word morphology thus > produces the set, > including the correct stem. > > Works great. 100% of the time. > > Just a tip from me. > > > On Mon, 2010-04-19 at 00:36 -0800, MitchK wrote: > > > Andy, I think it is important to know what a stemmer > really is. > > > > It reduces words to their infinitves. Those > infinitives do not refer to the > > real infinitive everytime, but however: for the > system, it is an infinitive, > > since all its derivates could be reduced to the same > form. > > Thats a stemmer. > > > > According to this, there can't exist a stemmer for > every language, because > > every language has got its own rules of how to reduce > a word to its > > infinitive. > > > > If you apply a stemmer for english language on a > german document, the > > results might be unexpected. However, sometimes it > still works good enough. > > > > Keep in mind that this is an algorithm. It is not > important whether the > > created infinitive is the real infinitive. It is only > important that most of > > the derivate forms can be reduced to the same basic > form. Please ask, if > > something is not clear. > > > > KStem: > > The wiki[1] says that KStem is less aggressive as the > standard stemmer. > > I guess that this means that there are more rules for > how to reduce a word > > to its infinitive and according to this the results > might be better. > > > > > > [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > > > Kind regards > > - Mitch > > >