Shalin, yeah. i guess in my opinion, the diacritics handling in conjunction
with a stemmer is unfortunately not very easy to do, without getting wierd
results.
for example, the snowball stemmers usually expect these diacritics to be
there, they are looking for something closer to the proper "dicti
I wasn't suggesting that they should be changed but trying to understand
why. This makes sense. Thanks Erik and Robert.
On Mon, Feb 22, 2010 at 6:16 AM, Robert Muir wrote:
> right, most stemmers expect the diacritics to be in their input to work
> correctly, too.
>
> On Sun, Feb 21, 2010 at 5:19
right, most stemmers expect the diacritics to be in their input to work
correctly, too.
On Sun, Feb 21, 2010 at 5:19 PM, Erik Hatcher wrote:
> won't some stemmers leave diacritics in the terms that ought to be removed
> before indexing?
>
>
>
> On Feb 21, 2010, at 4:57 PM, Shalin Shekhar Mangar w
won't some stemmers leave diacritics in the terms that ought to be
removed before indexing?
On Feb 21, 2010, at 4:57 PM, Shalin Shekhar Mangar wrote:
Hello,
Looking over the CharFilter franchise, it seems to me that the
ASCIIFoldingFilter is a perfect candidate for being a CharFilter as it