The angle that I am trying here is to create a dictionary from indexed terms that contain only correctly spelled words. We are doing this by having the field from which the dictionary is created utilize a type that employs solr.KeepWordFilterFactory, which in turn utilizes a text file of known correctly spelled words (including their respective derivations example: lead, leads, leading, etc.).
This is working great for us with the exception being those fields in our schema that contain proper names. I can't seem to get (unfiltered) terms from those fields along with (correctly spelled) terms from other fields into the single field upon which the dictionary is built. -----Original Message----- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Thursday, June 02, 2011 11:40 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Actually, someone just pointed out to me that a patch like this is unnecessary. The code works as-is if configured like this: <float name="thresholdTokenFrequency">.01</float> (correct) instead of this: <str name="thresholdTokenFrequency">.01</str> (incorrect) I tested this and it seems to work. I'm still am trying to figure out if using this parameter actually improves the quality of our spell suggestions, now that I know how to use it properly. Sorry about the mis-information earlier. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Dyer, James Sent: Wednesday, June 01, 2011 3:02 PM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Tanner, I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches. This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ... <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">text</str> <lst name="spellchecker"> <str name="name">spellchecker</str> <str name="field">Spelling_Dictionary</str> <str name="fieldType">text</str> <str name="spellcheckIndexDir">./spellchecker</str> <str name="thresholdTokenFrequency">.01</str> </lst> </searchComponent> James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Friday, May 27, 2011 6:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck Phrases are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James <james.d...@ingrambook.com>wrote: > Tanner, > > Currently Solr will only make suggestions for words that are not in > the dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". > However, if you do that, then it will try to "improve" every word in > your query, even the ones that are spelled correctly (so while it > might change "brake" to "break" it might also change "leg" to "log".) > > You might be able to alleviate some of the pain by setting the > "thresholdTokenFrequency" so as to remove misspelled and rarely-used > words from your dictionary, although I personally haven't been able to > get this parameter to work. It also doesn't seem to be documented on > the wiki but it is in the 1.4.1. source code, in class > IndexBasedSpellChecker. Its also mentioned in Smiley&Pugh's book. I > tried setting it like this, but got a ClassCastException on the float value: > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > <str name="queryAnalyzerFieldType">text_spelling</str> > <lst name="spellchecker"> > <str name="name">spellchecker</str> > <str name="field">Spelling_Dictionary</str> > <str name="fieldType">text_spelling</str> > <str name="buildOnOptimize">true</str> <str > name="thresholdTokenFrequency">.0000001</str> > </lst> > </searchComponent> > > I have it on my to-do list to look into this further but haven't yet. > If you decide to try it and can get it to work, please let me know how > you do it. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -----Original Message----- > From: Tanner Postert [mailto:tanner.post...@gmail.com] > Sent: Wednesday, February 23, 2011 12:53 PM > To: solr-user@lucene.apache.org > Subject: Spellcheck Phrases > > right now when I search for 'brake a leg', solr returns valid results > with no indication of misspelling, which is understandable since all > of those terms are valid words and are probably found in a few pieces > of our content. > My question is: > > is there any way for it to recognize that the phase should be "break a leg" > and not "brake a leg" and suggest the proper phrase? >