Otis,

I think this is a great idea.

you could also go even further by making a better example for
StemmerOverrideFilter (stemdict.txt)
(
http://wiki.apache.org/solr/LanguageAnalysis#solr.StemmerOverrideFilterFactory
)

for example:
animated <tab> animate
animation <tab> animation
animations <tab> animation

this might be a bit better (but more work!) than protected words since then
you could let animation and animations conflate, rather than just forcing
them to be all unchanged. i wouldnt go crazy and worry about animator
matching animation etc, but would at least let plural forms match the
singular, without screwing other things up.

On Fri, Jul 30, 2010 at 4:41 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hello,
>
> I'm looking for a list of English  words that, when stemmed by Porter
> stemmer,
> end up in the same stem as  some similar, but unrelated words.  Below are
> some
> examples:
>
> # this gets stemmed to "iron", so if you search for "ironic", you'll get
> "iron"
> matches
> ironic
>
> # same stem as animal
> anime
> animated
> animation
> animations
>
> I imagine such a list could be added to the example protwords.txt
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>


-- 
Robert Muir
rcm...@gmail.com

Reply via email to