> I was previously using the > PorterStemmer to do stemming and ran into > an issue where it was overly aggressive with some words or > abbreviations which I needed to stop. I have recently > switched to > KStem and I believe the issue is less, but I was wondering > still if > there was a way to set a number of stop words for which you > didn't > want stemming to occur or if there was a way to tell the > Stemmer to > store the unstemmed version as well. So for instance > if a query came > in for "Ahmed", the PorterStemmer would turn that into Ahm, > while in > this case Ahmed is a name and I want to search that > unstemmed. If > there was a stop word list I could attempt to compile a list > of words > I didn't want stem or if there was a way to say also say > create a > token for the unstemmed word so what went into the index for > Ahmed > would be "ahmed" "ahm" so we'd cover both cases. What > are the draw > backs of providing both.
StemmerOverrideFilterFactory and KeywordMarkerFilterFactory are used for these kind of purposes. http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming