[jira] [Resolved] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

Bruno Roustant (Jira) Fri, 28 Feb 2020 06:23:40 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bruno Roustant resolved LUCENE-9237.
------------------------------------
    Fix Version/s: 8.5
       Resolution: Fixed

Thanks [~dsmiley] for reviewing!

> Faster TermsEnum intersect for UniformSplit
> -------------------------------------------
>
>                 Key: LUCENE-9237
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9237
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Assignee: Bruno Roustant
>            Priority: Major
>             Fix For: 8.5
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> New version of TermsEnum intersect for UniformSplit. It is 75% more efficient 
> than the previous version for FuzzyQuery.
> Compared to BlockTree IntersectTermsEnum:
>  - It is still slower for FuzzyQuery (between -37% and -44% in our 
> benchmarks) but it is faster than the previous version (which was -65%).
>  - It is on par or slightly slower for WildcardQuery (between -5% and 0%).
>  - It is slightly faster for PrefixQuery (between +5% and +10%).
>  
> When I debugged thoroughly to understand what was the limitation of the 
> previous approach we had (to compute the common prefix between two 
> consecutive block keys in the FST), I saw that actually for all FuzzyQuery 
> the common prefix matched so we entered all blocks.
>  I realized that the FuzzyQuery automaton accepts many variations for the 
> prefix, and the common prefix was not long enough to allow us to filter 
> correctly.
> I looked at what VarGapFixedInterval did. It jumped all the time after each 
> term to find the next target term accepted by the automaton. And this was 
> sufficiently efficient thanks to a vital optimization that compared the 
> target term to the immediate following term, to actually not jump most of the 
> time.
> So I applied the same idea to compute the next accepted term and jump, but 
> now with a first condition based on the number of consecutively rejected 
> terms, and by anticipating the comparison of the accepted term with the 
> immediate next term. This is the main factor of the improvement. We leverage 
> also other optimizations that speed up the automaton validation of each 
> sequential term in the block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

Reply via email to