James: everything you said made perfect sense, and in hindsight was actually covered on the page -- it was just hte example that was bogus in light of the current config & defaults
I went ahead and fixed it based on your feedback, and beefed up the explanation of spellcheck.collateParam.* (now it's part of hte table instead of just a one off sentence out of context) https://cwiki.apache.org/confluence/display/solr/Spell+Checking https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604254&originalId=50859120 thanks! : Date: Fri, 9 Jan 2015 14:22:43 -0600 : From: "Dyer, James" <james.d...@ingramcontent.com> : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> : Subject: RE: can't make sense of spellchecker results when using techproducts : example : : Chris, : : - DirectSpellChecker has a setting for "minPrefix" which the techproducts example sets to 1 (also the default). So it will never try to correct the first character. I think this is both a performance optimization and is based on the assumption that we rarely misspell the first character. This is why it will not correct "hell" to "dell". I think it will allow you to set this to 0, if you want your sample query to work. : : - The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then using all the other parameters, queries internally to see if there any hits. This doesn't play very well when "q.op=OR" / "mm=1". So when you see a collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting some hits. So it considers it a valid query re-write, despite the absurdity. We could improve this example config by adding "spellcheck.collateParam.q.op=AND" to the defaults. (When using dismax, you would add "spellcheck.collateParam.mm=100%") Also, while the "collateParam" functionality is in the old Solr wiki, it doesn't seem to be in the reference manual, so we probably should add it as this would be pretty important for a lot of users. : : - Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you need not use "spellcheck.build". Its a no-op for both Direct and WordBreak, as these do not use sidecar indexes. : : So without changing the config, these queries illustrate the spellchecker pretty well, including the word-break functionality. : : http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND : http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND : : Spellcheck has a lot of gotchas, and I would wish we could dream up a way to make it easy for people. I remember it being a struggle for me when I was a new user, and I know we get lots of questions on the user-list about it. : : My apologies to you for not answering this sooner. : : James Dyer : Ingram Content Group : : : -----Original Message----- : From: Chris Hostetter [mailto:hossman_luc...@fucit.org] : Sent: Wednesday, December 17, 2014 6:49 PM : To: solr-user@lucene.apache.org : Subject: can't make sense of spellchecker results when using techproducts example : : : Ok, so i've been working on updating hte ref guide to account for hte new : way to run the "examples" in 5.0. : : The spell checking page... : : https://cwiki.apache.org/confluence/display/solr/Spell+Checking : : ...has some examples that loosely corroloate to the "techproducts" : example, but even if you ignore the specifics of those examples, i need : help understanding the basic behavior of hte spellchecker as configured in : the techproducts : : Assuming you run this... : : bin/solr -e techproducts : : ....with that example running & those docs indexed, this URL gives me : results i can't explain... : : http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true : : (see below) : : 1) "dell" is not listed as a possible suggestion for for "hell" (even if : the dictionary thinks "hold" is a better suggestion, why isn't "dell" even : included in the list of possibilities? : : 2) in the "collation" section, i can't make any sense of what these : results mean -- how is "hello ultrasharp" a suggested collationQuery when : *none* of the example docs contain both "hello" and "ultrasharp" ? : : http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp : : : So WTF is up with these spell check results? : : : <?xml version="1.0" encoding="UTF-8"?> : <response> : : <lst name="responseHeader"> : <int name="status">0</int> : <int name="QTime">15</int> : </lst> : <str name="command">build</str> : <result name="response" numFound="0" start="0"> : </result> : <lst name="spellcheck"> : <lst name="suggestions"> : <lst name="hell"> : <int name="numFound">6</int> : <int name="startOffset">0</int> : <int name="endOffset">4</int> : <int name="origFreq">0</int> : <arr name="suggestion"> : <lst> : <str name="word">hello</str> : <int name="freq">1</int> : </lst> : <lst> : <str name="word">here</str> : <int name="freq">2</int> : </lst> : <lst> : <str name="word">heat</str> : <int name="freq">1</int> : </lst> : <lst> : <str name="word">hold</str> : <int name="freq">1</int> : </lst> : <lst> : <str name="word">html</str> : <int name="freq">1</int> : </lst> : <lst> : <str name="word">héllo</str> : <int name="freq">1</int> : </lst> : </arr> : </lst> : <lst name="ultrashar"> : <int name="numFound">1</int> : <int name="startOffset">5</int> : <int name="endOffset">14</int> : <int name="origFreq">0</int> : <arr name="suggestion"> : <lst> : <str name="word">ultrasharp</str> : <int name="freq">1</int> : </lst> : </arr> : </lst> : </lst> : <bool name="correctlySpelled">false</bool> : <lst name="collations"> : <lst name="collation"> : <str name="collationQuery">hello ultrasharp</str> : <int name="hits">2</int> : <lst name="misspellingsAndCorrections"> : <str name="hell">hello</str> : <str name="ultrashar">ultrasharp</str> : </lst> : </lst> : <lst name="collation"> : <str name="collationQuery">here ultrasharp</str> : <int name="hits">3</int> : <lst name="misspellingsAndCorrections"> : <str name="hell">here</str> : <str name="ultrashar">ultrasharp</str> : </lst> : </lst> : <lst name="collation"> : <str name="collationQuery">heat ultrasharp</str> : <int name="hits">2</int> : <lst name="misspellingsAndCorrections"> : <str name="hell">heat</str> : <str name="ultrashar">ultrasharp</str> : </lst> : </lst> : <lst name="collation"> : <str name="collationQuery">hold ultrasharp</str> : <int name="hits">2</int> : <lst name="misspellingsAndCorrections"> : <str name="hell">hold</str> : <str name="ultrashar">ultrasharp</str> : </lst> : </lst> : <lst name="collation"> : <str name="collationQuery">html ultrasharp</str> : <int name="hits">2</int> : <lst name="misspellingsAndCorrections"> : <str name="hell">html</str> : <str name="ultrashar">ultrasharp</str> : </lst> : </lst> : </lst> : </lst> : </response> : : : : : : : -Hoss : http://www.lucidworks.com/ : : -Hoss http://www.lucidworks.com/