I'm playing around with the spell checker on 1.3 nightly build and
don't see any effect on changes to the "sp.dictionary.threshold" in
terms of dictionary size.  A value of 0.0 seems to create a dictionary
of the same size and content as a value of 0.9.  (I'd expect a very
small dictionary in the latter case.)  I think sp.dictionary.threshold
is a float parameter, but maybe I'm misunderstanding?

And just to be sure, I assume I can alter this parameter prior to
issue the "rebuild" command to build the dictionary -- I don't need to
reindex termSourceField between changes?

My solrconfig.xml has this definition for the handler:

<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
    <lst name="defaults">
        <int name="sp.query.suggestionCount">30</int>
        <float name="sp.query.accuracy">0.5</float>
    </lst>
    <str name="sp.dictionary.indexDir">spell</str>
    <str name="termSourceField">dictionary</str>
    <float name="sp.dictionary.threshold">0.9</float>
</requestHandler>

And schema.xml in case that is somehow relevant:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

<field name="dictionary" type="spell" indexed="true" stored="false"
    multiValued="true" omitNorms="true" />

Any advice?  I'd definitely like to tighten up the dictionary but it
appears to always include terms regardless of their frequency in the
source content.

Thanks,

Ron

Reply via email to