I've been away from parsers for a bit, but you should be able to subclass a
getFuzzyQuery() (or similar) call fairly easily.
Again, last time I looked, it used the automaton (fast) for <=2 and backed off
to truly slow for > 2. Note that transposition is only operational for the
automaton, not yet for the SlowFuzzyQuery.
<self-promotion>Might want to take a look at LUCENE-5205 and SOLR-5410. Those
offer a parser that uses SlowFuzzyQuery for exactly your use
case.</self-promotion>
The recommended solution for handling fuzziness > 2 (I think), though, is to
use character ngrams as in the SpellChecker.
Best,
Tim
-----Original Message-----
From: Michael Tobias [mailto:[email protected]]
Sent: Sunday, June 29, 2014 8:17 PM
To: [email protected]
Subject: SlowFuzzySearch
Hi guys
I know that Solr now has a fast Fuzzy Search capability for levenshtein
distances of up to 2, but I would like to use distances of 3 or 4 (up to half
the word length if possible).
I have been told it is possible to use an older fuzzy search version called
SlowFuzzyQuery but I am not sure how to use it. I realise it will be slow(er)
but my database will be reasonably small and I would like to test out the
performance to see if it is a feasible option. Is it still part of the Solr
code or must I install it separately?
Any examples of its usage????? And for distances of 2 or less does it actually
perform a fast fuzzy search or must I revert to using the ~ syntax for those
faster fuzzy searches?
All help appreciated.
Michael