> I was previously using the
> PorterStemmer to do stemming and ran into
> an issue where it was overly aggressive with some words or
> abbreviations which I needed to stop.  I have recently
> switched to
> KStem and I believe the issue is less, but I was wondering
> still if
> there was a way to set a number of stop words for which you
> didn't
> want stemming to occur or if there was a way to tell the
> Stemmer to
> store the unstemmed version as well.  So for instance
> if a query came
> in for "Ahmed", the PorterStemmer would turn that into Ahm,
> while in
> this case Ahmed is a name and I want to search that
> unstemmed.  If
> there was a stop word list I could attempt to compile a list
> of words
> I didn't want stem or if there was a way to say also say
> create a
> token for the unstemmed word so what went into the index for
> Ahmed
> would be "ahmed" "ahm" so we'd cover both cases.  What
> are the draw
> backs of providing both.

StemmerOverrideFilterFactory and KeywordMarkerFilterFactory are used for these 
kind of purposes. 
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming




Reply via email to