On 4/21/10 1:43 PM, Walter Underwood wrote:
On Apr 21, 2010, at 10:30 AM, Mark Miller wrote:
But they don't usually call 'non algorithmic' stemming 'stemming'. Stemming
usually means using a simple heuristic process. When you use vocabulary and
morphology, its usually called lemmatization rather than stemming.
"stemmer" is jargon that does not have a precise definition.
Usually, as the wikipedia article Robert linked to states, stemming is
done without knowledge of the context of the word. With stemming you are
not necessarily finding lemmas - just stems. Stems can be anything as
long as the same word always stems to the same thing - lemmas are more
than that. I don't think the definition is super precise, but I also
wouldn't call it jargon.
For example, the LinguistX morphological analyzers are called "stemmers" and
they provide options that are dictionary-based inflectional, dictionary-based
derivational, and algorithmic. You can also combine those, so you can get accurate
dictionary-based stems, then use an algorithmic stemmer on words not in the dictionary.
That just sounds like a mix of stemming and lemmatization.
- Mark