Chris, - DirectSpellChecker has a setting for "minPrefix" which the techproducts example sets to 1 (also the default). So it will never try to correct the first character. I think this is both a performance optimization and is based on the assumption that we rarely misspell the first character. This is why it will not correct "hell" to "dell". I think it will allow you to set this to 0, if you want your sample query to work.
- The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then using all the other parameters, queries internally to see if there any hits. This doesn't play very well when "q.op=OR" / "mm=1". So when you see a collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting some hits. So it considers it a valid query re-write, despite the absurdity. We could improve this example config by adding "spellcheck.collateParam.q.op=AND" to the defaults. (When using dismax, you would add "spellcheck.collateParam.mm=100%") Also, while the "collateParam" functionality is in the old Solr wiki, it doesn't seem to be in the reference manual, so we probably should add it as this would be pretty important for a lot of users. - Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you need not use "spellcheck.build". Its a no-op for both Direct and WordBreak, as these do not use sidecar indexes. So without changing the config, these queries illustrate the spellchecker pretty well, including the word-break functionality. http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND Spellcheck has a lot of gotchas, and I would wish we could dream up a way to make it easy for people. I remember it being a struggle for me when I was a new user, and I know we get lots of questions on the user-list about it. My apologies to you for not answering this sooner. James Dyer Ingram Content Group -----Original Message----- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, December 17, 2014 6:49 PM To: solr-user@lucene.apache.org Subject: can't make sense of spellchecker results when using techproducts example Ok, so i've been working on updating hte ref guide to account for hte new way to run the "examples" in 5.0. The spell checking page... https://cwiki.apache.org/confluence/display/solr/Spell+Checking ...has some examples that loosely corroloate to the "techproducts" example, but even if you ignore the specifics of those examples, i need help understanding the basic behavior of hte spellchecker as configured in the techproducts Assuming you run this... bin/solr -e techproducts ....with that example running & those docs indexed, this URL gives me results i can't explain... http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true (see below) 1) "dell" is not listed as a possible suggestion for for "hell" (even if the dictionary thinks "hold" is a better suggestion, why isn't "dell" even included in the list of possibilities? 2) in the "collation" section, i can't make any sense of what these results mean -- how is "hello ultrasharp" a suggested collationQuery when *none* of the example docs contain both "hello" and "ultrasharp" ? http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp So WTF is up with these spell check results? <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">15</int> </lst> <str name="command">build</str> <result name="response" numFound="0" start="0"> </result> <lst name="spellcheck"> <lst name="suggestions"> <lst name="hell"> <int name="numFound">6</int> <int name="startOffset">0</int> <int name="endOffset">4</int> <int name="origFreq">0</int> <arr name="suggestion"> <lst> <str name="word">hello</str> <int name="freq">1</int> </lst> <lst> <str name="word">here</str> <int name="freq">2</int> </lst> <lst> <str name="word">heat</str> <int name="freq">1</int> </lst> <lst> <str name="word">hold</str> <int name="freq">1</int> </lst> <lst> <str name="word">html</str> <int name="freq">1</int> </lst> <lst> <str name="word">héllo</str> <int name="freq">1</int> </lst> </arr> </lst> <lst name="ultrashar"> <int name="numFound">1</int> <int name="startOffset">5</int> <int name="endOffset">14</int> <int name="origFreq">0</int> <arr name="suggestion"> <lst> <str name="word">ultrasharp</str> <int name="freq">1</int> </lst> </arr> </lst> </lst> <bool name="correctlySpelled">false</bool> <lst name="collations"> <lst name="collation"> <str name="collationQuery">hello ultrasharp</str> <int name="hits">2</int> <lst name="misspellingsAndCorrections"> <str name="hell">hello</str> <str name="ultrashar">ultrasharp</str> </lst> </lst> <lst name="collation"> <str name="collationQuery">here ultrasharp</str> <int name="hits">3</int> <lst name="misspellingsAndCorrections"> <str name="hell">here</str> <str name="ultrashar">ultrasharp</str> </lst> </lst> <lst name="collation"> <str name="collationQuery">heat ultrasharp</str> <int name="hits">2</int> <lst name="misspellingsAndCorrections"> <str name="hell">heat</str> <str name="ultrashar">ultrasharp</str> </lst> </lst> <lst name="collation"> <str name="collationQuery">hold ultrasharp</str> <int name="hits">2</int> <lst name="misspellingsAndCorrections"> <str name="hell">hold</str> <str name="ultrashar">ultrasharp</str> </lst> </lst> <lst name="collation"> <str name="collationQuery">html ultrasharp</str> <int name="hits">2</int> <lst name="misspellingsAndCorrections"> <str name="hell">html</str> <str name="ultrashar">ultrasharp</str> </lst> </lst> </lst> </lst> </response> -Hoss http://www.lucidworks.com/