No one has any input on my post below about the spelling suggestions? I just find it a bit frustrating not being able to understand this feature better, and why it doesn't give the expected results. A built in "explain" feature really would have helped.
/Jimi -----Original Message----- From: jimi.hulleg...@svensktnaringsliv.se [mailto:jimi.hulleg...@svensktnaringsliv.se] Sent: Friday, December 16, 2016 9:58 PM To: solr-user@lucene.apache.org Subject: Can't get spelling suggestions to work properly Hi, I'm trying to add the spelling suggestion feature to our search, but I'm having problems getting suggestions on some misspellings. For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000 documents in our index. A search for the incorrect spelling 'myket' (a missing 'c') gives several spelling suggestions, and the top one is 'mycket'. This is the wanted/expected behaivor. But a search for the incorrect spelling 'mycet' (a missing 'k') gives no spelling suggestions. The only difference between these two searches is that the one that results in spelling suggestions had zero results, while the other one had two (2) results. These two documents contain this incorrect spelling ('mycet'). Can this be the cause of no spelling suggestions? But I have set 'maxQueryFrequency' to 0.001, and with 40.000 documents in the index that should mean that the word can exist in up to 40 documents, and since 2 is less than 40 I argue that that this word would be considered a spelling misstake. But for some reason the solr spellchecker considers 'myket' as an incorrect spelling, while 'mycet' incorrectly is considered as a correct spelling. Also, I tried with spellcheck.accuracy=0 just to rule out that I have a too high accuracy setting, but that didn't help. Can someone see what I'm doing wrong, or give some tips on configuration changes and/or how I can troubleshoot this? For example, is there any way to debug the spellchecker function? Here are the searches: Search for 'myket': http://localhost:8080/solr/s2/select/?q=myket&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true Spellcheck output for 'myket': <lst name="spellcheck"> <lst name="suggestions"> <lst name="myket"> <int name="numFound">16</int> <int name="startOffset">0</int> <int name="endOffset">5</int> <int name="origFreq">0</int> <arr name="suggestion"> <lst> <str name="word">mycket</str> <int name="freq">14039</int> </lst> [...] </arr> </lst> <bool name="correctlySpelled">false</bool> <lst name="collation"> <str name="collationQuery">mycket</str> <int name="hits">14005</int> <lst name="misspellingsAndCorrections"> <str name="myket">mycket</str> </lst> </lst> [...] </lst> </lst> </lst> Spellcheck output for 'mycet': http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true Search for 'mycet': http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true Spellcheck output: <lst name="spellcheck"> <lst name="suggestions"> <bool name="correctlySpelled">true</bool> </lst> </lst> Below is the relevant configuration. The field type used for the spellchecker: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory" /> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([.])" replacement=" " /> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.KeywordRepeatFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> </fieldType> Parameters added to the standard request handler: <str name="spellcheck.count">20</str> <str name="spellcheck.dictionary">swedishSpelling</str> <str name="spellcheck.collate">true</str> <str name="spellcheck.extendedResults">true</str> <str name="spellcheck.collateExtendedResults">true</str> <str name="spellcheck.maxCollations">2</str> <str name="spellcheck.maxCollationTries">10</str> And the spellcheck component: <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">text</str> <lst name="spellchecker"> <str name="name">swedishSpelling</str> <str name="field">swedishSpelling</str> <str name="classname">solr.DirectSolrSpellChecker</str> <str name="distanceMeasure">internal</str> <float name="accuracy">0.0</float> <int name="maxEdits">2</int> <int name="minPrefix">0</int> <int name="maxInspections">5</int> <int name="minQueryLength">4</int> <float name="maxQueryFrequency">0.01</float> <float name="thresholdTokenFrequency">0.001</float> </lst> <lst name="spellchecker"> <str name="name">englishSpelling</str> <str name="field">englishSpelling</str> <str name="classname">solr.DirectSolrSpellChecker</str> <str name="distanceMeasure">internal</str> <float name="accuracy">0.0</float> <int name="maxEdits">2</int> <int name="minPrefix">0</int> <int name="maxInspections">5</int> <int name="minQueryLength">4</int> <float name="maxQueryFrequency">0.001</float> <float name="thresholdTokenFrequency">0.0025</float> </lst> </searchComponent> Regards /Jimi