Re: LucidWorks Solr

MitchK Mon, 19 Apr 2010 01:37:05 -0700

Andy, I think it is important to know what a stemmer really is.

It reduces words to their infinitves. Those infinitives do not refer to the
real infinitive everytime, but however: for the system, it is an infinitive,
since all its derivates could be reduced to the same form.
Thats a stemmer.


According to this, there can't exist a stemmer for every language, because
every language has got its own rules of how to reduce a word to its
infinitive.

If you apply a stemmer for english language on a german document, the
results might be unexpected. However, sometimes it still works good enough. 

Keep in mind that this is an algorithm. It is not important whether the
created infinitive is the real infinitive. It is only important that most of
the derivate forms can be reduced to the same basic form. Please ask, if
something is not clear.

KStem:
The wiki[1] says that KStem is less aggressive as the standard stemmer.
I guess that this means that there are more rules for how to reduce a word
to its infinitive and according to this the results might be better.


[1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: LucidWorks Solr

Reply via email to