I'm playing around with the spell checker on 1.3 nightly build and
don't see any effect on changes to the "sp.dictionary.threshold" in
terms of dictionary size. A value of 0.0 seems to create a dictionary
of the same size and content as a value of 0.9. (I'd expect a very
small dictionary in the latter case.) I think sp.dictionary.threshold
is a float parameter, but maybe I'm misunderstanding?
And just to be sure, I assume I can alter this parameter prior to
issue the "rebuild" command to build the dictionary -- I don't need to
reindex termSourceField between changes?
My solrconfig.xml has this definition for the handler:
<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
<lst name="defaults">
<int name="sp.query.suggestionCount">30</int>
<float name="sp.query.accuracy">0.5</float>
</lst>
<str name="sp.dictionary.indexDir">spell</str>
<str name="termSourceField">dictionary</str>
<float name="sp.dictionary.threshold">0.9</float>
</requestHandler>
And schema.xml in case that is somehow relevant:
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<field name="dictionary" type="spell" indexed="true" stored="false"
multiValued="true" omitNorms="true" />
Any advice? I'd definitely like to tighten up the dictionary but it
appears to always include terms regardless of their frequency in the
source content.
Thanks,
Ron