Chris,

- DirectSpellChecker has a setting for "minPrefix" which the techproducts 
example sets to 1 (also the default).  So it will never try to correct the 
first character.  I think this is both a performance optimization and is based 
on the assumption that we rarely misspell the first character.  This is why it 
will not  correct "hell" to "dell".  I think it will allow you to set this to 
0, if you want your sample query to work.

- The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then 
using all the other parameters, queries internally to see if there any hits.  
This doesn't play very well when "q.op=OR" / "mm=1".  So when you see a 
collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting 
some hits.  So it considers it a valid query re-write, despite the absurdity.  
We could improve this example config by adding 
"spellcheck.collateParam.q.op=AND" to the defaults.  (When using dismax, you 
would add "spellcheck.collateParam.mm=100%")  Also, while the "collateParam" 
functionality is in the old Solr wiki, it doesn't seem to be in the reference 
manual, so we probably should add it as this would be pretty important for a 
lot of users.

- Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you 
need not use "spellcheck.build".  Its a no-op for both Direct and WordBreak, as 
these do not use sidecar indexes.

So without changing the config, these queries illustrate the spellchecker 
pretty well, including the word-break functionality.

http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND

Spellcheck has a lot of gotchas, and I would wish we could dream up a way to 
make it easy for people.  I remember it being a struggle for me when I was a 
new user, and I know we get lots of questions on the user-list about it.

My apologies to you for not answering this sooner.

James Dyer
Ingram Content Group


-----Original Message-----
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, December 17, 2014 6:49 PM
To: solr-user@lucene.apache.org
Subject: can't make sense of spellchecker results when using techproducts 
example


Ok, so i've been working on updating hte ref guide to account for hte new 
way to run the "examples" in 5.0.

The spell checking page...

        https://cwiki.apache.org/confluence/display/solr/Spell+Checking

...has some examples that loosely corroloate to the "techproducts" 
example, but even if you ignore the specifics of those examples, i need 
help understanding the basic behavior of hte spellchecker as configured in 
the techproducts

Assuming you run this...

        bin/solr -e techproducts

....with that example running & those docs indexed, this URL gives me 
results i can't explain...

http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true

(see below)

1) "dell" is not listed as a possible suggestion for for "hell" (even if 
the dictionary thinks "hold" is a better suggestion, why isn't "dell" even 
included in the list of possibilities?

2) in the "collation" section, i can't make any sense of what these 
results mean -- how is "hello ultrasharp" a suggested collationQuery when 
*none* of the example docs contain both "hello" and "ultrasharp" ?

http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp


So WTF is up with these spell check results?


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
   <int name="status">0</int>
   <int name="QTime">15</int>
</lst>
<str name="command">build</str>
<result name="response" numFound="0" start="0">
</result>
<lst name="spellcheck">
   <lst name="suggestions">
     <lst name="hell">
       <int name="numFound">6</int>
       <int name="startOffset">0</int>
       <int name="endOffset">4</int>
       <int name="origFreq">0</int>
       <arr name="suggestion">
         <lst>
           <str name="word">hello</str>
           <int name="freq">1</int>
         </lst>
         <lst>
           <str name="word">here</str>
           <int name="freq">2</int>
         </lst>
         <lst>
           <str name="word">heat</str>
           <int name="freq">1</int>
         </lst>
         <lst>
           <str name="word">hold</str>
           <int name="freq">1</int>
         </lst>
         <lst>
           <str name="word">html</str>
           <int name="freq">1</int>
         </lst>
         <lst>
           <str name="word">héllo</str>
           <int name="freq">1</int>
         </lst>
       </arr>
     </lst>
     <lst name="ultrashar">
       <int name="numFound">1</int>
       <int name="startOffset">5</int>
       <int name="endOffset">14</int>
       <int name="origFreq">0</int>
       <arr name="suggestion">
         <lst>
           <str name="word">ultrasharp</str>
           <int name="freq">1</int>
         </lst>
       </arr>
     </lst>
   </lst>
   <bool name="correctlySpelled">false</bool>
   <lst name="collations">
     <lst name="collation">
       <str name="collationQuery">hello ultrasharp</str>
       <int name="hits">2</int>
       <lst name="misspellingsAndCorrections">
         <str name="hell">hello</str>
         <str name="ultrashar">ultrasharp</str>
       </lst>
     </lst>
     <lst name="collation">
       <str name="collationQuery">here ultrasharp</str>
       <int name="hits">3</int>
       <lst name="misspellingsAndCorrections">
         <str name="hell">here</str>
         <str name="ultrashar">ultrasharp</str>
       </lst>
     </lst>
     <lst name="collation">
       <str name="collationQuery">heat ultrasharp</str>
       <int name="hits">2</int>
       <lst name="misspellingsAndCorrections">
         <str name="hell">heat</str>
         <str name="ultrashar">ultrasharp</str>
       </lst>
     </lst>
     <lst name="collation">
       <str name="collationQuery">hold ultrasharp</str>
       <int name="hits">2</int>
       <lst name="misspellingsAndCorrections">
         <str name="hell">hold</str>
         <str name="ultrashar">ultrasharp</str>
       </lst>
     </lst>
     <lst name="collation">
       <str name="collationQuery">html ultrasharp</str>
       <int name="hits">2</int>
       <lst name="misspellingsAndCorrections">
         <str name="hell">html</str>
         <str name="ultrashar">ultrasharp</str>
       </lst>
     </lst>
   </lst>
</lst>
</response>






-Hoss
http://www.lucidworks.com/

Reply via email to