On Fri, Jul 30, 2010 at 4:41 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
> I'm looking for a list of English  words that, when stemmed by Porter stemmer,
> end up in the same stem as  some similar, but unrelated words.  Below are some
> examples:
>
> # this gets stemmed to "iron", so if you search for "ironic", you'll get 
> "iron"
> matches
> ironic
>
> # same stem as animal
> anime
> animated
> animation
> animations
>
> I imagine such a list could be added to the example protwords.txt

+1

No reason to make everyone come up with their own list.
Unless a good list already exists out there... we could semi-automate
it by running a large corpus through the stemmer and then for each
stem, list the original words.  The manual part would be looking at
the output to see the collisions (unless someone has a better idea).

-Yonik
http://www.lucidimagination.com

Reply via email to