Hello Walter.
We believe this kind of thing is better managed by a content team that
works with user feedback. It would be costly everytime we find a word
that brings irrelevant results the fact that, to correct that, we'd need
to build a new stemmer. It's a lot better to create a simple interface
that allows anyone to define which are the protected words we need
according to user feedback in a simple, easy way.
Erik just said it wouldn't be hard to bring that functionality to
Snowball. Erik, do you know what needs to be done in order to achieve
that? Don't you guys have plans for that? I'm sure that I'm not the only
one with that problem using SOLR with portuguese language (or any other
idiom).
Thank you very much for your help,
Leonardo.
Walter Underwood escreveu:
You can define exceptions in the Snowball language and generate
a new stemmer. See the examples here:
http://snowball.tartarus.org/algorithms/english/stemmer.html
wunder
On 2/18/09 9:56 AM, "Erik Hatcher" <e...@ehatchersolutions.com> wrote:
On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
Is there a way to make the snowball algorithm work with a
protwords.txt file?
Currently, and unfortunately, no - the protected words feature is not
available the SnowballPorterFilterFactory. It wouldn't take much
effort to bring that capability across though.
Erik