On Oct 8, 2008, at 6:20 PM, Jason Rennie wrote:

On Wed, Oct 8, 2008 at 3:31 PM, Jason Rennie <[EMAIL PROTECTED]> wrote:

I just tried J-W and *yes* it seems to do a much better job! I'd certainly
vote for that becoming the default :)


Ack!  I did some more testing and J-W results started to get weird
(including suggesting "courses" for "coursets" even though "corsets" is 4x as frequent as "courses", and "nylo" for "nylom" even though "nylon" is 200x more frequent than "nylo"). The default measure got these right. Does J-W
use frequency information at all?


Sorting in the SpellChecker is handled by the SuggestWord.compareTo() method in Lucene. It looks like:
public final int compareTo(SuggestWord a) {
    // first criteria: the edit distance
    if (score > a.score) {
      return 1;
    }
    if (score < a.score) {
      return -1;
    }

    // second criteria (if first criteria is equal): the popularity
    if (freq > a.freq) {
      return 1;
    }

    if (freq < a.freq) {
      return -1;
    }
    return 0;
  }

I could see you opening a JIRA issue in Lucene against the SC to make it so that the sorting could be overridden/pluggable. A patch to do so would be even better ;-)

Cheers,
Grant

Reply via email to