> > >>> For example, if we get query "tommyhitfiger" and have terms "tommy" and >>> "hitfiger" in the index, how to fix the query? >>> >> > The usual approach to solving this is to index compound words, i.e. when > producing a spellchecker dictionary add a record "tommyhitfiger" with a > field that points to "tommy hitfiger". Details vary depending on what > spellchecking impl. you use. >
I'm using the default Solr's spell checker, which is using n-gram index and Levenshtein distance. Can it's be customized to include compound words? What alternative spell checkers for Lucene/Solr do exist? I tried to experiment with Lucene spell checker and noticed that if configured with a low accuracy it can find words "tommy" and "hilfiger" that form the whole word. So I was able to create some logic which post-process spell checker results and finds the correct query "tommy hilfiger". It just iterates over all possible combinations of terms suggested by spell checker and compares the resulting query to original by DoubleMetaphor. I'm not sure that this is the best solution though, probably it's just not fast enough. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics