Shalin Shekhar Mangar schrieb:
On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig <m...@as-guides.com>
wrote:

Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that "Käse" isn't mishandled) given the following example:

You will need to set the correct tokenizer and filters for your field
which can handle your language correctly. Look at the GermanAnalyzer
in Lucene contrib-analysis. It uses StandardTokenizer, StandardFilter,
LowerCaseFilter, StopFilter, GermanStemFilter with a custom stopword
list.

Hello Shalin,

thanks for your kind answer, and sorry for my delay in responding.

Due to my newbieness in this domain, I misphrased my question. What
I wanted to say (and Jonathan, too, I think) is that the regular
expression in that SpellingQueryConverter only deals with ASCII,
which is insufficient for most languages, including French and
German.

I think the regular expression in SpellingQueryConverter should be
something like:

    (?:(?!(\w+:|\d+)))[\p{javaLowerCase}\p{javaUpperCase}\d_]+
vs. (?:(?!(\w+:|\d+)))\w+

Then, correct German and French TokenStreams are generated in the
example program I posted.

But I may well have misunderstood the purpose of this class. You
will know.

Michael Ludwig

Reply via email to