On Wed, Apr 21, 2010 at 1:49 PM, Mark Miller <markrmil...@gmail.com> wrote:

>
>  I believe that's covered by morphology?
>
>
The problem is typically a morphological analyzer emits multiple solutions,
which include POS.

So morphology can tell you that "building" has two solutions: the gerund
form which you might stem to "build", or the noun form which you would stem
to "building".
But, you need more stuff (POS tagging, etc) to decide which to pick to
arrive at a lemma... and if your users are entering very short queries you
can see how this could be inaccurate, since there isn't much context.

So what snowball does (simply stemming build, building, buildings all to
"build") might seem silly at first, but you can see how it avoids this
entire mess.

-- 
Robert Muir
rcm...@gmail.com

Reply via email to