Re: Minimum word length for stemming

Jan Høydahl Thu, 31 Jan 2013 14:13:20 -0800

Hi,

I believe each stemmer implementation decides that themselves. At least the 
MinimalNorwegianStemmer has a built-in logic which stems certain suffixes only 
if the token is >N chars.


If you want external control, you can look at 
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming and the 
KeywordMarkerFilterFactory which lets you list a bunch of words you do not want 
the stemmers to touch. I guess you could easily implement your own 
TokenLengthMarkerFilterFactory which keeps words from being stemmed based on 
length.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

31. jan. 2013 kl. 17:35 skrev Jamie Johnson <jej2...@gmail.com>:

> Is there a capability to provide a minimum word threshold that must be met
> before a word is analyzed by a stemmer or other language analyzer?

Re: Minimum word length for stemming

Reply via email to