Thanks for the tip. Yeah, I think the stemming confounds search results as it stands (porter stemmer).
I was also thinking of using my dictionary of 500,000 words with their complete morphologies and conjugations and create a synonyms.txt to provide english accurate morphology. Is this a good idea? Darren > Hi Darren, > > You might want to look at the KStemmer > (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) > instead of the standard PorterStemmer. It essentially has a 'dictionary' > of exception words where stemming stops if found, so in your case > president won't be stemmed any further than president (but presidents will > be stemmed to president). You will have to integrate it into solr > yourself, but that's straightforward. > > HTH > Brendan > > > On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote: > >> Hi, >> It seems to me that because the stemming does not produce >> grammatically correct stems in many of the cases, >> search anomalies can occur like the one I am seeing where I have a >> document with "president" in it and it is returned >> when I search for "preside", a different word entirely. >> >> Is this correct or acceptable behavior? Previous discussions here on >> stemming, I was told its ok as long as all the words reduce >> to the same stem, but when different words reduce to the same stem it >> seems to affect search results in a "bad way". >> >> Darren > >