Spellcheck returning suggestions for words that exist in the dictionary

Sanjana Sridhar Fri, 10 Nov 2017 05:53:16 -0800

Spellcheck works perfectly when I misspell a word, but if there is a word
that already exists in the dictionary, Solr still returns suggestions for
it. eg: bike gets spell corrected to bake.


 I unfortunately cannot use the *maxResultsForSuggest* field as I need to
return the correct spelling irrespective of if results exist or not.

*Is there a way to prevent Solr from suggesting a spelling if the word
already exists in the dictionary?*

I'm using both the IndexBasedSpellChecker and FileBasedSpellChecker


Relevant code snippets from solrconfig.xml

*REQUEST HANDLER*

 <requestHandler name="/query" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="uf">*</str>
      <str name="rows">10</str>
      <str name="echoParams">explicit</str>
      <str name="spellcheck">true</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.dictionary">file_spellcheck</str>
      <str name="spellcheck.dictionary">index_spellcheck</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>


*SPELLCHECK COMPONENT*

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <str name="queryAnalyzerFieldType">text_general</str>
      <!-- a spellchecker built from a field of the main index -->
      <lst name="spellchecker">
        <str name="name">index_spellcheck</str>
        <str name="field">content</str>
        <str name="classname">solr.IndexBasedSpellChecker</str>
        <str name="spellcheckIndexDir">spellchecker</str>
        <str
name="distanceMeasure">org.apache.lucene.search.spell.LevensteinDistance</str>
        <str name="accuracy">0.75</str>
        <int name="maxEdits">1</int>
        <int name="minPrefix">0</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <!-- <float name="maxQueryFrequency">0.01</float> -->
        <!-- <float name="thresholdTokenFrequency">.01</float> -->
      </lst>

      <!-- A spellchecker that reads the list of words from a file -->
      <lst name="spellchecker">
        <str name="classname">solr.FileBasedSpellChecker</str>
        <str name="name">file_spellcheck</str>
        <str name="field">content</str>
        <str name="accuracy">0.75</str>
        <str name="sourceLocation">spellings.txt</str>
        <str name="characterEncoding">UTF-8</str>
        <str name="spellcheckIndexDir">spellcheckerFile</str>
      </lst>
    </searchComponent>


*FIELD IN MANAGED-SCHEMA*

    <field name="content" type="text_spell" indexed="true" stored="false"
multiValued="true"/>
    <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100">
     <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.PatternReplaceFilterFactory" pattern="'"
replacement="" replace="all" />
       <filter class="solr.WordDelimiterFilterFactory"
        generateWordParts="1"
        generateNumberParts="1"
        catenateWords="1"
        stemEnglishPossessive="0"
      />
      <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
    </fieldType>
    <!-- English spell check fields-->
    <copyField source="name_en" dest="content"/>
    <copyField source="desc_en" dest="content"/>
    <copyField source="keywords_en" dest="content"/>
    <copyField source="brand_name_en" dest="content"/>


Any help would be greatly appreciated.


Thank you,

Sanjana Sridhar

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.

Spellcheck returning suggestions for words that exist in the dictionary

Reply via email to