Re: WordDelimiterFilter splits at non-ASCII chars

Shalin Shekhar Mangar Tue, 15 Jul 2008 09:19:47 -0700

Hi Stefan,

I wrote a test case for the problem you described but it is working fine. I
used the following definition:


<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" preserveOriginal="0"/>

What configuration are you using? If it is different, please share it so
that I can test with it.

On Tue, Jul 15, 2008 at 7:59 PM, Stefan Oestreicher <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> as I understand the WordDelimiterFilter should split on case changes, word
> delimiters and changes from character to digit, but it should not
> differentiate between ASCII and multibyte chars. It does however. The word
> "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
> unfortunately renders this filter quite unusable for me. Am i missing
> something or is this a bug?
> I'm using solr 1.3 built from trunk.
>
> TIA,
>
> Stefan Oestreicher
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: WordDelimiterFilter splits at non-ASCII chars

Reply via email to