Hi Mark, Thanks for that info looks very interesting, would be great to see your code. Out of interest did you use the dictionary and the phonetic file? Did you see better results with both?
In regards to the secondary part to check the corpus for matching suggestions, would another way to do this is to have an event listener to listen for commits, and then build the dictionary for matching corpus words that way, then you avoid the performance hit at query time. Cheers, Dan On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland <mark.holl...@zoopla.co.uk>wrote: > Hi, > > I found the suggestions returned from the standard solr spellcheck not to > be > that relevant. By contrast, aspell, given the same dictionary and mispelled > words, gives much more accurate suggestions. > > I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, > the java aspell library. I also extended the SpellCheckComponent to take > the > matrix of suggested words and query the corpus to find the first > combination > of suggestions which returned a match. This works well for my use case, > where term frequency is irrelevant to spelling or scoring. > > I'd like to publish the code in case someone finds it useful (although it's > a bit crude at the moment and will need a decent tidy up). Would it be > appropriate to open up a Jira issue for this? > > Cheers, > ~mark > > On 27 July 2010 09:33, dan sutton <danbsut...@gmail.com> wrote: > > > Hi, > > > > I've recently been looking into Spellchecking in solr, and was struck by > > how > > limited the usefulness of the tool was. > > > > Like most corpora , ours contains lots of different spelling mistakes for > > the same word, so the 'spellcheck.onlyMorePopular' is not really that > > useful > > unless you click on it numerous times. > > > > I was thinking that since most of the time people spell words correctly > why > > was there no other frequency parameter that could enter into the score? > > i.e. > > something like: > > > > spell_score ~ edit_dist * freq > > > > I'm sure others have come across this issue and was wonding what > > steps/algorithms they have used to overcome these limitations? > > > > Cheers, > > Dan > > >