Re: French and SpellingQueryConverter

Michael Ludwig Mon, 11 May 2009 02:17:36 -0700

Shalin Shekhar Mangar schrieb:

On Fri, May 8, 2009 at 2:14 AM, Jonathan Mamou <ma...@il.ibm.com>
wrote:

SpellingQueryConverter always splits words with special
character. I think that the issue is in SpellingQueryConverter
class Pattern.compile.("(?:(?!(\\w+:|\\d+)))\\w+");?:
According to
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html,
\w A word character: [a-zA-Z_0-9]
I think that special character should also be added to the regex.


Same issue for the GermanAnalyzer as for the FrenchAnalyzer.

http://wiki.apache.org/solr/SpellCheckComponent says:

  The SpellingQueryConverter class does not deal properly with
  non-ASCII characters. In this case, you have either to use
  spellcheck.q, or to implement your own QueryConverter.

If you use spellcheck.q parameter for specifying the spelling
query, then the field's analyzer will be used (in this case,
FrenchAnalyzer). If you use the q parameter, then the
SpellingQueryConverter is used.


Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that "Käse" isn't mishandled) given the following example:

package org.apache.solr.spelling;
import org.apache.lucene.analysis.de.GermanAnalyzer;
public class GermanTest {
    public static void main(String[] args) {
        SpellingQueryConverter sqc = new SpellingQueryConverter();
        sqc.analyzer = new GermanAnalyzer();
        System.out.println(sqc.convert("Käse"));
    }
}

Note the result of the above, which is plain wrong, reads:

  [(k,0,1,type=<ALPHANUM>), (se,2,4,type=<ALPHANUM>)]

Thanks.

Michael Ludwig

Re: French and SpellingQueryConverter

Reply via email to