On 4/21/10 1:43 PM, Walter Underwood wrote:
On Apr 21, 2010, at 10:30 AM, Mark Miller wrote:

But they don't usually call 'non algorithmic' stemming 'stemming'. Stemming 
usually means using a simple heuristic process. When you use vocabulary and 
morphology, its usually called lemmatization rather than stemming.

"stemmer" is jargon that does not have a precise definition.
Usually, as the wikipedia article Robert linked to states, stemming is done without knowledge of the context of the word. With stemming you are not necessarily finding lemmas - just stems. Stems can be anything as long as the same word always stems to the same thing - lemmas are more than that. I don't think the definition is super precise, but I also wouldn't call it jargon.
For example, the LinguistX morphological analyzers are called "stemmers" and 
they provide options that are dictionary-based inflectional, dictionary-based 
derivational, and algorithmic. You can also combine those, so you can get accurate 
dictionary-based stems, then use an algorithmic stemmer on words not in the dictionary.

That just sounds like a mix of stemming and lemmatization.

- Mark

Reply via email to