>
>
>>> For example, if we get query "tommyhitfiger" and have terms "tommy" and
>>> "hitfiger" in the index, how to fix the query?
>>>
>>
> The usual approach to solving this is to index compound words, i.e. when
> producing a spellchecker dictionary add a record "tommyhitfiger" with a
> field that points to "tommy hitfiger". Details vary depending on what
> spellchecking impl. you use.
>

I'm using the default Solr's spell checker, which is using n-gram index and
Levenshtein distance. Can it's be customized to include compound words? What
alternative spell checkers for Lucene/Solr do exist?

I tried to experiment with Lucene spell checker and noticed that if
configured with a low accuracy it can find words "tommy" and "hilfiger" that
form the whole word. So I was able to create some logic which post-process
spell checker results and finds the correct query "tommy hilfiger". It just
iterates over all possible combinations of terms suggested by spell checker
and compares the resulting query to original by DoubleMetaphor. I'm not sure
that this is the best solution though, probably it's just not fast enough.

-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics

Reply via email to