James: everything you said made perfect sense, and in hindsight was 
actually covered on the page -- it was just hte example that was bogus in 
light of the current config & defaults

I went ahead and fixed it based on your feedback, and beefed up the 
explanation of spellcheck.collateParam.* (now it's part of hte table 
instead of just a one off sentence out of context)

https://cwiki.apache.org/confluence/display/solr/Spell+Checking
https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604254&originalId=50859120

thanks!



: Date: Fri, 9 Jan 2015 14:22:43 -0600
: From: "Dyer, James" <james.d...@ingramcontent.com>
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
: Subject: RE: can't make sense of spellchecker results when using techproducts
:     example
: 
: Chris,
: 
: - DirectSpellChecker has a setting for "minPrefix" which the techproducts 
example sets to 1 (also the default).  So it will never try to correct the 
first character.  I think this is both a performance optimization and is based 
on the assumption that we rarely misspell the first character.  This is why it 
will not  correct "hell" to "dell".  I think it will allow you to set this to 
0, if you want your sample query to work.
: 
: - The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then 
using all the other parameters, queries internally to see if there any hits.  
This doesn't play very well when "q.op=OR" / "mm=1".  So when you see a 
collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting 
some hits.  So it considers it a valid query re-write, despite the absurdity.  
We could improve this example config by adding 
"spellcheck.collateParam.q.op=AND" to the defaults.  (When using dismax, you 
would add "spellcheck.collateParam.mm=100%")  Also, while the "collateParam" 
functionality is in the old Solr wiki, it doesn't seem to be in the reference 
manual, so we probably should add it as this would be pretty important for a 
lot of users.
: 
: - Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you 
need not use "spellcheck.build".  Its a no-op for both Direct and WordBreak, as 
these do not use sidecar indexes.
: 
: So without changing the config, these queries illustrate the spellchecker 
pretty well, including the word-break functionality.
: 
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
: 
: Spellcheck has a lot of gotchas, and I would wish we could dream up a way to 
make it easy for people.  I remember it being a struggle for me when I was a 
new user, and I know we get lots of questions on the user-list about it.
: 
: My apologies to you for not answering this sooner.
: 
: James Dyer
: Ingram Content Group
: 
: 
: -----Original Message-----
: From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
: Sent: Wednesday, December 17, 2014 6:49 PM
: To: solr-user@lucene.apache.org
: Subject: can't make sense of spellchecker results when using techproducts 
example
: 
: 
: Ok, so i've been working on updating hte ref guide to account for hte new 
: way to run the "examples" in 5.0.
: 
: The spell checking page...
: 
:       https://cwiki.apache.org/confluence/display/solr/Spell+Checking
: 
: ...has some examples that loosely corroloate to the "techproducts" 
: example, but even if you ignore the specifics of those examples, i need 
: help understanding the basic behavior of hte spellchecker as configured in 
: the techproducts
: 
: Assuming you run this...
: 
:       bin/solr -e techproducts
: 
: ....with that example running & those docs indexed, this URL gives me 
: results i can't explain...
: 
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true
: 
: (see below)
: 
: 1) "dell" is not listed as a possible suggestion for for "hell" (even if 
: the dictionary thinks "hold" is a better suggestion, why isn't "dell" even 
: included in the list of possibilities?
: 
: 2) in the "collation" section, i can't make any sense of what these 
: results mean -- how is "hello ultrasharp" a suggested collationQuery when 
: *none* of the example docs contain both "hello" and "ultrasharp" ?
: 
: 
http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp
: 
: 
: So WTF is up with these spell check results?
: 
: 
: <?xml version="1.0" encoding="UTF-8"?>
: <response>
: 
: <lst name="responseHeader">
:    <int name="status">0</int>
:    <int name="QTime">15</int>
: </lst>
: <str name="command">build</str>
: <result name="response" numFound="0" start="0">
: </result>
: <lst name="spellcheck">
:    <lst name="suggestions">
:      <lst name="hell">
:        <int name="numFound">6</int>
:        <int name="startOffset">0</int>
:        <int name="endOffset">4</int>
:        <int name="origFreq">0</int>
:        <arr name="suggestion">
:          <lst>
:            <str name="word">hello</str>
:            <int name="freq">1</int>
:          </lst>
:          <lst>
:            <str name="word">here</str>
:            <int name="freq">2</int>
:          </lst>
:          <lst>
:            <str name="word">heat</str>
:            <int name="freq">1</int>
:          </lst>
:          <lst>
:            <str name="word">hold</str>
:            <int name="freq">1</int>
:          </lst>
:          <lst>
:            <str name="word">html</str>
:            <int name="freq">1</int>
:          </lst>
:          <lst>
:            <str name="word">héllo</str>
:            <int name="freq">1</int>
:          </lst>
:        </arr>
:      </lst>
:      <lst name="ultrashar">
:        <int name="numFound">1</int>
:        <int name="startOffset">5</int>
:        <int name="endOffset">14</int>
:        <int name="origFreq">0</int>
:        <arr name="suggestion">
:          <lst>
:            <str name="word">ultrasharp</str>
:            <int name="freq">1</int>
:          </lst>
:        </arr>
:      </lst>
:    </lst>
:    <bool name="correctlySpelled">false</bool>
:    <lst name="collations">
:      <lst name="collation">
:        <str name="collationQuery">hello ultrasharp</str>
:        <int name="hits">2</int>
:        <lst name="misspellingsAndCorrections">
:          <str name="hell">hello</str>
:          <str name="ultrashar">ultrasharp</str>
:        </lst>
:      </lst>
:      <lst name="collation">
:        <str name="collationQuery">here ultrasharp</str>
:        <int name="hits">3</int>
:        <lst name="misspellingsAndCorrections">
:          <str name="hell">here</str>
:          <str name="ultrashar">ultrasharp</str>
:        </lst>
:      </lst>
:      <lst name="collation">
:        <str name="collationQuery">heat ultrasharp</str>
:        <int name="hits">2</int>
:        <lst name="misspellingsAndCorrections">
:          <str name="hell">heat</str>
:          <str name="ultrashar">ultrasharp</str>
:        </lst>
:      </lst>
:      <lst name="collation">
:        <str name="collationQuery">hold ultrasharp</str>
:        <int name="hits">2</int>
:        <lst name="misspellingsAndCorrections">
:          <str name="hell">hold</str>
:          <str name="ultrashar">ultrasharp</str>
:        </lst>
:      </lst>
:      <lst name="collation">
:        <str name="collationQuery">html ultrasharp</str>
:        <int name="hits">2</int>
:        <lst name="misspellingsAndCorrections">
:          <str name="hell">html</str>
:          <str name="ultrashar">ultrasharp</str>
:        </lst>
:      </lst>
:    </lst>
: </lst>
: </response>
: 
: 
: 
: 
: 
: 
: -Hoss
: http://www.lucidworks.com/
: 
: 

-Hoss
http://www.lucidworks.com/

Reply via email to