Mark, I'd like to see your code if you open a JIRA for this. I recently opened SOLR-2010 with a patch that does something similar to the second part only of what you describe (find combinations that actually return a match). But I'm not sure if my approach is the best one so I would like to see yours to compare.
James Dyer E-Commerce Systems Ingram Book Company (615) 213-4311 -----Original Message----- From: Mark Holland [mailto:mark.holl...@zoopla.co.uk] Sent: Tuesday, July 27, 2010 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellchecking and frequency Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. I'd like to publish the code in case someone finds it useful (although it's a bit crude at the moment and will need a decent tidy up). Would it be appropriate to open up a Jira issue for this? Cheers, ~mark On 27 July 2010 09:33, dan sutton <danbsut...@gmail.com> wrote: > Hi, > > I've recently been looking into Spellchecking in solr, and was struck by > how > limited the usefulness of the tool was. > > Like most corpora , ours contains lots of different spelling mistakes for > the same word, so the 'spellcheck.onlyMorePopular' is not really that > useful > unless you click on it numerous times. > > I was thinking that since most of the time people spell words correctly why > was there no other frequency parameter that could enter into the score? > i.e. > something like: > > spell_score ~ edit_dist * freq > > I'm sure others have come across this issue and was wonding what > steps/algorithms they have used to overcome these limitations? > > Cheers, > Dan >