[ https://issues.apache.org/jira/browse/LUCENE-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068410#comment-17068410 ]
Adrien Grand commented on LUCENE-9289: -------------------------------------- Have you been able to measure speedups with this patch? I'm not very familiar with the spell checkers but this method call doesn't seem to be on the critical path, as we seem to be doing initial pruning with a FuzzyTermsEnum and mostly using this method to merge results across shards, where I'm expecting the set of terms to check to be much smaller. I could be wrong though! > Speed up Levenshtein distance calculation when we don't need the exact > distance > ------------------------------------------------------------------------------- > > Key: LUCENE-9289 > URL: https://issues.apache.org/jira/browse/LUCENE-9289 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker > Reporter: Andras Salamon > Priority: Minor > Attachments: SOLR-14360-01.patch > > > Sometimes when we calculate the Levenshtein distance we don't need the exact > distance, we only want to know if the strings are similar enough. > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SolrSpellChecker.java#L113-L114] > {noformat} > sug.score = sd.getDistance(original, sug.string); > if (sug.score < min) continue; {noformat} > If we use this threshold in the distance calculation, we can speed it up, we > can stop the calculation when we already know that the the the distance will > be lower than the threshold. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org