Can you share your spellchecker setup and the code for the test case?
I would like to reproduce it and see what's going on.
On Oct 7, 2008, at 2:18 PM, Jason Rennie wrote:
On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll
<[EMAIL PROTECTED]>wrote:
Is there anyway you can write up a small test case? This
definitely sounds
like a bug.
I tried adding single word documents according to the top ten
suggestions
and frequencies for "chanl". I.e. I created a fresh index, then
added 834
"chanel" docs; 10 "chant" docs; 8 "chang" docs; 4 "chani" docs; 1
doc each
of "chand", "chana", "charl" and "chane"; 106 docs of "chan"; and
1950 docs
of "chair". The fact that "chan" would come after the single-freq
terms
seems wrong to me.
I'm guessing the "FuzzyQuery score" (
http://wiki.apache.org/jakarta-lucene/SpellChecker) may be the
reason for
some of the weird results I'm seeing. Based on what I've seen and
also
according to the SpellChecker wiki, it sounds like ordering is done
first by
this FuzzyQuery score ((edit distance)/(length of word)), then by
popularity. This seems to explain "chan" coming after
"chand" (above),
"candyâ" coming before "candy" and "yell" coming before "yello".
On Tue, Oct 7, 2008 at 11:59 AM, Grant Ingersoll
<[EMAIL PROTECTED]>wrote:
Again, probably b/c of the distance. What distance measure are you
using?
I'm not specifying a distance measure.
No, it should run in both cases. Can you reproduce in a small test
case?
In this test case I created, I searched for "chane" (with
spellcheck=true)
and got one result. When I searched for "chanel", it returned
numFound="834". I have "accuracy" set to 0.5. Should the
spellchecker not
suggest "chanel" for the "chane" query?
Jason
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ