Shalin Shekhar Mangar schrieb:
On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig <m...@as-guides.com>
wrote:
Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that "Käse" isn't mishandled) given the following example:
You will need to set the correct tokenizer and filters for your field
which can handle your language correctly. Look at the GermanAnalyzer
in Lucene contrib-analysis. It uses StandardTokenizer, StandardFilter,
LowerCaseFilter, StopFilter, GermanStemFilter with a custom stopword
list.
Hello Shalin,
thanks for your kind answer, and sorry for my delay in responding.
Due to my newbieness in this domain, I misphrased my question. What
I wanted to say (and Jonathan, too, I think) is that the regular
expression in that SpellingQueryConverter only deals with ASCII,
which is insufficient for most languages, including French and
German.
I think the regular expression in SpellingQueryConverter should be
something like:
(?:(?!(\w+:|\d+)))[\p{javaLowerCase}\p{javaUpperCase}\d_]+
vs. (?:(?!(\w+:|\d+)))\w+
Then, correct German and French TokenStreams are generated in the
example program I posted.
But I may well have misunderstood the purpose of this class. You
will know.
Michael Ludwig