Mark,

I'd like to see your code if you open a JIRA for this.  I recently
opened SOLR-2010 with a patch that does something similar to the second
part only of what you describe (find combinations that actually return a
match).  But I'm not sure if my approach is the best one so I would like
to see yours to compare.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Mark Holland [mailto:mark.holl...@zoopla.co.uk] 
Sent: Tuesday, July 27, 2010 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking and frequency

Hi,

I found the suggestions returned from the standard solr spellcheck not
to be
that relevant. By contrast, aspell, given the same dictionary and
mispelled
words, gives much more accurate suggestions.

I therefore wrote an implementation of SolrSpellChecker that wraps
jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.

I'd like to publish the code in case someone finds it useful (although
it's
a bit crude at the moment and will need a decent tidy up). Would it be
appropriate to open up a Jira issue for this?

Cheers,
~mark

On 27 July 2010 09:33, dan sutton <danbsut...@gmail.com> wrote:

> Hi,
>
> I've recently been looking into Spellchecking in solr, and was struck
by
> how
> limited the usefulness of the tool was.
>
> Like most corpora , ours contains lots of different spelling mistakes
for
> the same word, so the 'spellcheck.onlyMorePopular' is not really that
> useful
> unless you click on it numerous times.
>
> I was thinking that since most of the time people spell words
correctly why
> was there no other frequency parameter that could enter into the
score?
> i.e.
> something like:
>
> spell_score ~ edit_dist * freq
>
> I'm sure others have come across this issue and was wonding what
> steps/algorithms they have used to overcome these limitations?
>
> Cheers,
> Dan
>

Reply via email to