Otis, I think this is a great idea.
you could also go even further by making a better example for StemmerOverrideFilter (stemdict.txt) ( http://wiki.apache.org/solr/LanguageAnalysis#solr.StemmerOverrideFilterFactory ) for example: animated <tab> animate animation <tab> animation animations <tab> animation this might be a bit better (but more work!) than protected words since then you could let animation and animations conflate, rather than just forcing them to be all unchanged. i wouldnt go crazy and worry about animator matching animation etc, but would at least let plural forms match the singular, without screwing other things up. On Fri, Jul 30, 2010 at 4:41 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hello, > > I'm looking for a list of English words that, when stemmed by Porter > stemmer, > end up in the same stem as some similar, but unrelated words. Below are > some > examples: > > # this gets stemmed to "iron", so if you search for "ironic", you'll get > "iron" > matches > ironic > > # same stem as animal > anime > animated > animation > animations > > I imagine such a list could be added to the example protwords.txt > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > -- Robert Muir rcm...@gmail.com