RE: spellcheck.collate returning all results

Dyer, James Mon, 23 May 2011 09:54:45 -0700

Richard,

To enable the guarantee you need to specify "spellcheck.maxCollationTries" with 
a value other than zero (which is default).  There is cost involved with 
verifying beforehand if the collations will return hits so this feature is 
"off" by default.  Also, you may want to enable extended collations with 
"spellcheck.collateExtendedResults" to know beforehand how many hits you'll 
get.  It also will detail exactly which correction was subbed in for which 
original misspelled word.


Two things you might want to be aware of:
- This is new functionality for 3.1 so it doesn't work on 1.4 without a patch 
(see SOLR-2010 in jira).

- There is a critical bug in the spell check collate functionality that affects 
any use of "spellcheck.collate=true" in 3.1 and Trunk (4.x).  If using collate 
(even *without* "spellcheck.maxCollationTries") you should apply SOLR-2462 
first (see https://issues.apache.org/jira/browse/SOLR-2462 for information & a 
patch).  It is likely this (or a similar fix) will eventually get committed and 
included in the next bug-fix release, should there be one.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Richard Hodsdon [mailto:hodsdon.rich...@gmail.com] 
Sent: Monday, May 23, 2011 9:54 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.collate returning all results

Hi,

I have been trying to set up spellchecking on our system using the
SpellCheckComponent.

According to the wiki by using spellcheck.collate any fq parameters that are
passed through to the original query while doing spellcheck will return
results if the collation is re-run. So far this has not been happening.
I am getting results returned but if I re-run the query passing through the
collated q param it finds nothing.

My initial Query i as follows:
http://127.0.0.1:8983/solr/select?q=reeed%20bulll&spellcheck=true&spellcheck.collate=true&fq=content_type:post

and I get back in the spellcheck lst
<lst name="spellcheck">
<lst name="suggestions">
<lst name="reeed">
<int name="numFound">1</int>
<int name="startOffset">0</int>
<int name="endOffset">5</int>
<arr name="suggestion">
<str>red</str>
</arr>
</lst>
<lst name="bulll">
<int name="numFound">1</int>
<int name="startOffset">6</int>
<int name="endOffset">11</int>
<arr name="suggestion">
<str>bull</str>
</arr>
</lst>
<str name="collation">red bull</str>
</lst>
</lst>

The issue is if I run the query again using the 'correct' query 

http://127.0.0.1:8983/solr/select?q=red%20bull&spellcheck=true&spellcheck.collate=true&fq=content_type:post&wt=json

I get no reponses returned. This is because of my content_type:post, which
is filtering correctly. 

I have also run spellcheck.build=true 

I have set up my solrconfig.xml as follows.

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textgen</str>
    <lst name="spellchecker">
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
      <str name="field">name</str>
      <str name="buildOnCommit">true</str>
          <str name="spellcheck.collate">true</str>
    </lst>
  </searchComponent>

<requestHandler name="search" class="solr.SearchHandler" default="true">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
     </lst>
     <arr name="last-components">
        <str>spellcheck</str>
     </arr>
</requestHandler>

My scheme.xml declares textgen fieldsType and name field
<field name="name" type="textgen" indexed="true" stored="true"/>
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

Thanks

Richard

        

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-collate-returning-all-results-tp2975621p2975621.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: spellcheck.collate returning all results

Reply via email to