Hi,

I found the suggestions returned from the standard solr spellcheck not to be
that relevant. By contrast, aspell, given the same dictionary and mispelled
words, gives much more accurate suggestions.

I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take the
matrix of suggested words and query the corpus to find the first combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.

I'd like to publish the code in case someone finds it useful (although it's
a bit crude at the moment and will need a decent tidy up). Would it be
appropriate to open up a Jira issue for this?

Cheers,
~mark

On 27 July 2010 09:33, dan sutton <danbsut...@gmail.com> wrote:

> Hi,
>
> I've recently been looking into Spellchecking in solr, and was struck by
> how
> limited the usefulness of the tool was.
>
> Like most corpora , ours contains lots of different spelling mistakes for
> the same word, so the 'spellcheck.onlyMorePopular' is not really that
> useful
> unless you click on it numerous times.
>
> I was thinking that since most of the time people spell words correctly why
> was there no other frequency parameter that could enter into the score?
> i.e.
> something like:
>
> spell_score ~ edit_dist * freq
>
> I'm sure others have come across this issue and was wonding what
> steps/algorithms they have used to overcome these limitations?
>
> Cheers,
> Dan
>

Reply via email to