Why do these approaches have to be mutually exclusive? Do a dictionary lookup, if no satisfactory match found use an algorithmic stemmer. Would probably save a few CPU cycles by algorithmic stemming iff necessary.
On Wed, Apr 21, 2010 at 1:31 PM, Robert Muir <rcm...@gmail.com> wrote: > sy to look at the "faults" of some algorithmic stemmer, in truth its > only purpose is to cause related forms of the word to conflate to the same > form, and hopefully avoiding unrelated terms from conflating to this form. > > A dictionary-based stemmer is out-of-date the day you put it into > production: languages aren't static. For example, you can't expect a > dictionary-based stemmer to properly deal with forms like "googling" or > "tweets" that have recently slipped into English vocabulary, but an > algorithmic stemmer will likely deal with these just fine.