[ https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047674#comment-17047674 ]
ASF subversion and git services commented on LUCENE-9237: --------------------------------------------------------- Commit 7302effd9c6d7dde203992df428f0c8d2389bfb3 in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7302eff ] LUCENE-9237: Faster UniformSplit intersect TermsEnum. > Faster TermsEnum intersect for UniformSplit > ------------------------------------------- > > Key: LUCENE-9237 > URL: https://issues.apache.org/jira/browse/LUCENE-9237 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Bruno Roustant > Assignee: Bruno Roustant > Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > New version of TermsEnum intersect for UniformSplit. It is 75% more efficient > than the previous version for FuzzyQuery. > Compared to BlockTree IntersectTermsEnum: > - It is still slower for FuzzyQuery (between -37% and -44% in our > benchmarks) but it is faster than the previous version (which was -65%). > - It is on par or slightly slower for WildcardQuery (between -5% and 0%). > - It is slightly faster for PrefixQuery (between +5% and +10%). > > When I debugged thoroughly to understand what was the limitation of the > previous approach we had (to compute the common prefix between two > consecutive block keys in the FST), I saw that actually for all FuzzyQuery > the common prefix matched so we entered all blocks. > I realized that the FuzzyQuery automaton accepts many variations for the > prefix, and the common prefix was not long enough to allow us to filter > correctly. > I looked at what VarGapFixedInterval did. It jumped all the time after each > term to find the next target term accepted by the automaton. And this was > sufficiently efficient thanks to a vital optimization that compared the > target term to the immediate following term, to actually not jump most of the > time. > So I applied the same idea to compute the next accepted term and jump, but > now with a first condition based on the number of consecutively rejected > terms, and by anticipating the comparison of the accepted term with the > immediate next term. This is the main factor of the improvement. We leverage > also other optimizations that speed up the automaton validation of each > sequential term in the block. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org