On Wed, Apr 21, 2010 at 1:49 PM, Mark Miller <markrmil...@gmail.com> wrote:
> > I believe that's covered by morphology? > > The problem is typically a morphological analyzer emits multiple solutions, which include POS. So morphology can tell you that "building" has two solutions: the gerund form which you might stem to "build", or the noun form which you would stem to "building". But, you need more stuff (POS tagging, etc) to decide which to pick to arrive at a lemma... and if your users are entering very short queries you can see how this could be inaccurate, since there isn't much context. So what snowball does (simply stemming build, building, buildings all to "build") might seem silly at first, but you can see how it avoids this entire mess. -- Robert Muir rcm...@gmail.com