Sure. I just sent the relevant files/code directly to you. Let me know if you don't get them or have any trouble with them.
Jason On Tue, Oct 7, 2008 at 3:27 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Can you share your spellchecker setup and the code for the test case? I > would like to reproduce it and see what's going on. > > > > > On Oct 7, 2008, at 2:18 PM, Jason Rennie wrote: > > On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll <[EMAIL PROTECTED] >> >wrote: >> >> Is there anyway you can write up a small test case? This definitely >>> sounds >>> like a bug. >>> >> >> >> I tried adding single word documents according to the top ten suggestions >> and frequencies for "chanl". I.e. I created a fresh index, then added 834 >> "chanel" docs; 10 "chant" docs; 8 "chang" docs; 4 "chani" docs; 1 doc each >> of "chand", "chana", "charl" and "chane"; 106 docs of "chan"; and 1950 >> docs >> of "chair". The fact that "chan" would come after the single-freq terms >> seems wrong to me. >> >> I'm guessing the "FuzzyQuery score" ( >> http://wiki.apache.org/jakarta-lucene/SpellChecker) may be the reason for >> some of the weird results I'm seeing. Based on what I've seen and also >> according to the SpellChecker wiki, it sounds like ordering is done first >> by >> this FuzzyQuery score ((edit distance)/(length of word)), then by >> popularity. This seems to explain "chan" coming after "chand" (above), >> "candyâ" coming before "candy" and "yell" coming before "yello". >> >> On Tue, Oct 7, 2008 at 11:59 AM, Grant Ingersoll <[EMAIL PROTECTED] >> >wrote: >> >> Again, probably b/c of the distance. What distance measure are you >>> using? >>> >> >> >> I'm not specifying a distance measure. >> >> >> No, it should run in both cases. Can you reproduce in a small test case? >>> >> >> >> In this test case I created, I searched for "chane" (with spellcheck=true) >> and got one result. When I searched for "chanel", it returned >> numFound="834". I have "accuracy" set to 0.5. Should the spellchecker >> not >> suggest "chanel" for the "chane" query? >> >> Jason >> > > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/