Sure.  I just sent the relevant files/code directly to you.  Let me know if
you don't get them or have any trouble with them.

Jason

On Tue, Oct 7, 2008 at 3:27 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> Can you share your spellchecker setup and the code for the test case?  I
> would like to reproduce it and see what's going on.
>
>
>
>
> On Oct 7, 2008, at 2:18 PM, Jason Rennie wrote:
>
>  On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Is there anyway you can write up a small test case?  This definitely
>>> sounds
>>> like a bug.
>>>
>>
>>
>> I tried adding single word documents according to the top ten suggestions
>> and frequencies for "chanl".  I.e. I created a fresh index, then added 834
>> "chanel" docs; 10 "chant" docs; 8 "chang" docs; 4 "chani" docs; 1 doc each
>> of "chand", "chana", "charl" and "chane"; 106 docs of "chan"; and 1950
>> docs
>> of "chair".  The fact that "chan" would come after the single-freq terms
>> seems wrong to me.
>>
>> I'm guessing the "FuzzyQuery score" (
>> http://wiki.apache.org/jakarta-lucene/SpellChecker) may be the reason for
>> some of the weird results I'm seeing.  Based on what I've seen and also
>> according to the SpellChecker wiki, it sounds like ordering is done first
>> by
>> this FuzzyQuery score ((edit distance)/(length of word)), then by
>> popularity.  This seems to explain "chan" coming after "chand" (above),
>> "candyâ" coming before "candy" and "yell" coming before "yello".
>>
>> On Tue, Oct 7, 2008 at 11:59 AM, Grant Ingersoll <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Again, probably b/c of the distance.  What distance measure are you
>>> using?
>>>
>>
>>
>> I'm not specifying a distance measure.
>>
>>
>>  No, it should run in both cases.  Can you reproduce in a small test case?
>>>
>>
>>
>> In this test case I created, I searched for "chane" (with spellcheck=true)
>> and got one result.  When I searched for "chanel", it returned
>> numFound="834".  I have "accuracy" set to 0.5.  Should the spellchecker
>> not
>> suggest "chanel" for the "chane" query?
>>
>> Jason
>>
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>


-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

Reply via email to