Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-22 Thread Robert Muir
Shalin, yeah. i guess in my opinion, the diacritics handling in conjunction with a stemmer is unfortunately not very easy to do, without getting wierd results. for example, the snowball stemmers usually expect these diacritics to be there, they are looking for something closer to the proper "dicti

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-22 Thread Shalin Shekhar Mangar
I wasn't suggesting that they should be changed but trying to understand why. This makes sense. Thanks Erik and Robert. On Mon, Feb 22, 2010 at 6:16 AM, Robert Muir wrote: > right, most stemmers expect the diacritics to be in their input to work > correctly, too. > > On Sun, Feb 21, 2010 at 5:19

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-21 Thread Robert Muir
right, most stemmers expect the diacritics to be in their input to work correctly, too. On Sun, Feb 21, 2010 at 5:19 PM, Erik Hatcher wrote: > won't some stemmers leave diacritics in the terms that ought to be removed > before indexing? > > > > On Feb 21, 2010, at 4:57 PM, Shalin Shekhar Mangar w

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-21 Thread Erik Hatcher
won't some stemmers leave diacritics in the terms that ought to be removed before indexing? On Feb 21, 2010, at 4:57 PM, Shalin Shekhar Mangar wrote: Hello, Looking over the CharFilter franchise, it seems to me that the ASCIIFoldingFilter is a perfect candidate for being a CharFilter as it