Hello all, I'm getting a strange suggestion for a purposely mistyped word in Solr 1.4.1
I search for the term "snia", and I would expect the term "sina" to be suggested, as this is a fairly common word in quite a bit of the indexed documents. Instead, I'm getting india as a suggestion, which is only indexed once, and has (at least as far as my understanding of the algorithm goes) a greater Levenshtein distance than sina. The configuration for the spellchecker is pretty straigforward, basically taken directly from the examples: <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">spell</str> <str name="buildOnOptimize">true</str> <str name="buildOnCommit">true</str> <str name="spellcheckIndexDir">./spellchecker1</str> <str name="comparatorClass">freq</str> <float name="thresholdTokenFrequency">.01</float> </lst> I have tried to use the comparatorClass there (as frequency would probably yield better results for me), but only saw after that it is only available for Solr4. The complete suggestions I get from the standard search component is: <lst name="spellcheck"> <lst name="suggestions"> <lst name="snia"> <int name="numFound">5</int> <int name="startOffset">0</int> <int name="endOffset">4</int> <int name="origFreq">0</int> <arr name="suggestion"> <lst> <str name="word">india</str> <int name="freq">1</int> </lst> <lst> <str name="word">sina</str> <int name="freq">30</int> </lst> <lst> <str name="word">soa</str> <int name="freq">4</int> </lst> <lst> <str name="word">unit</str> <int name="freq">3</int> </lst> <lst> <str name="word">sei</str> <int name="freq">2</int> </lst> </arr> </lst> <bool name="correctlySpelled">false</bool> </lst> </lst> Apart from the india suggestions, the other ones are okay, though I need to tune my stopwords for the (German) indexer a bit more. Is there any explanation why india is chosen over sina in the suggestions? Is there anything I can tweak in the configuration to get the desired result? If some information is missing, don't hestitate to ask, I will try to supply it then. Many thanks in advance, Jens