[jira] [Commented] (LUCENE-9289) Speed up Levenshtein distance calculation when we don't need the exact distance

Adrien Grand (Jira) Fri, 27 Mar 2020 01:23:09 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068410#comment-17068410
 ]


Adrien Grand commented on LUCENE-9289:
--------------------------------------

Have you been able to measure speedups with this patch? I'm not very familiar 
with the spell checkers but this method call doesn't seem to be on the critical 
path, as we seem to be doing initial pruning with a FuzzyTermsEnum and mostly 
using this method to merge results across shards, where I'm expecting the set 
of terms to check to be much smaller. I could be wrong though!

> Speed up Levenshtein distance calculation when we don't need the exact 
> distance
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-9289
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9289
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Andras Salamon
>            Priority: Minor
>         Attachments: SOLR-14360-01.patch
>
>
> Sometimes when we calculate the Levenshtein distance we don't need the exact 
> distance, we only want to know if the strings are similar enough.
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SolrSpellChecker.java#L113-L114]
> {noformat}
> sug.score = sd.getDistance(original, sug.string);        
> if (sug.score < min) continue; {noformat}
> If we use this threshold in the distance calculation, we can speed it up, we 
> can stop the calculation when we already know that the the the distance will 
> be lower than the threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9289) Speed up Levenshtein distance calculation when we don't need the exact distance

Reply via email to