Re: French and SpellingQueryConverter

Michael Ludwig Tue, 19 May 2009 05:23:10 -0700

Shalin Shekhar Mangar schrieb:

On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig <m...@as-guides.com>
wrote:

Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that "Käse" isn't mishandled) given the following example:


You will need to set the correct tokenizer and filters for your field
which can handle your language correctly. Look at the GermanAnalyzer
in Lucene contrib-analysis. It uses StandardTokenizer, StandardFilter,
LowerCaseFilter, StopFilter, GermanStemFilter with a custom stopword
list.


Hello Shalin,

thanks for your kind answer, and sorry for my delay in responding.

Due to my newbieness in this domain, I misphrased my question. What
I wanted to say (and Jonathan, too, I think) is that the regular
expression in that SpellingQueryConverter only deals with ASCII,
which is insufficient for most languages, including French and
German.

I think the regular expression in SpellingQueryConverter should be
something like:

    (?:(?!(\w+:|\d+)))[\p{javaLowerCase}\p{javaUpperCase}\d_]+
vs. (?:(?!(\w+:|\d+)))\w+

Then, correct German and French TokenStreams are generated in the
example program I posted.

But I may well have misunderstood the purpose of this class. You
will know.

Michael Ludwig

Re: French and SpellingQueryConverter

Reply via email to