Re: WordDelimiterFilter splits at non-ASCII chars

Yonik Seeley Tue, 15 Jul 2008 09:29:21 -0700

On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher
<[EMAIL PROTECTED]> wrote:
> as I understand the WordDelimiterFilter should split on case changes, word
> delimiters and changes from character to digit, but it should not
> differentiate between ASCII and multibyte chars. It does however. The word
> "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
> unfortunately renders this filter quite unusable for me. Am i missing
> something or is this a bug?
> I'm using solr 1.3 built from trunk.


Look for charset issues in communicating with Solr.  I just tried this
with the "text" field via Solr's analysis.jsp and it works fine.

-Yonik

Re: WordDelimiterFilter splits at non-ASCII chars

Reply via email to