[PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

via GitHub Sat, 04 May 2024 03:20:51 -0700


zhongshanhao opened a new pull request, #13343:
URL: https://github.com/apache/lucene/pull/13343


   Sometime, due to the need to decode impact and calculate the maximum score, 
`ImpactsDISI` typically adds more overhead than it enables skipping.
   
   Let's talk the query:
   
   ```
   +title:a +title:b +title:c +title:d
   ```
   
   These term(a, b, c, d) has a large doc frequency. 
   
   Maybe the query result set is small, not even a minimum competition score is 
produced, `BlockMaxConjunctionBulkScorer` and `BlockMaxConjunctionScorer`  
still try to get max score at the beginning of the `advance`.
   
   This PR is designed to solve this problem, to advoid the use of 
`ImpactsDISI` when no minimum competitive score has been set. 
   
   Here are the benchmark of this PR on wikimediumall. 
   
   iter 4: 
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                        AndHighHigh       13.42      (6.3%)       12.65      
(5.5%)   -5.7% ( -16% -    6%) 0.131
                         OrHighHigh       11.19     (12.1%)       10.59     
(10.1%)   -5.4% ( -24% -   19%) 0.441
                  HighTermMonthSort     1940.04      (2.8%)     1884.34      
(6.9%)   -2.9% ( -12% -    7%) 0.390
                         AndHighMed       72.28      (4.5%)       70.47      
(4.2%)   -2.5% ( -10% -    6%) 0.358
                   HighSloppyPhrase        4.89      (4.9%)        4.77      
(4.9%)   -2.5% ( -11% -    7%) 0.428
              BrowseMonthSSDVFacets        3.17      (2.3%)        3.11      
(1.7%)   -1.8% (  -5% -    2%) 0.145
                      OrNotHighHigh      151.65      (3.2%)      148.99      
(3.2%)   -1.8% (  -7% -    4%) 0.385
                       OrHighNotMed      272.56      (6.0%)      267.90      
(4.6%)   -1.7% ( -11% -    9%) 0.613
                           HighTerm      283.60      (5.9%)      278.93      
(5.4%)   -1.6% ( -12% -   10%) 0.647
               MedTermDayTaxoFacets        6.97      (7.4%)        6.85      
(6.6%)   -1.6% ( -14% -   13%) 0.713
                       OrHighNotLow      277.16      (4.9%)      272.79      
(4.6%)   -1.6% ( -10% -    8%) 0.597
                      OrHighNotHigh      172.23      (4.6%)      169.55      
(3.6%)   -1.6% (  -9% -    6%) 0.552
               HighIntervalsOrdered        2.69      (2.3%)        2.65      
(2.2%)   -1.5% (  -5% -    3%) 0.299
                          OrHighMed       73.52      (4.8%)       72.66      
(3.4%)   -1.2% (  -8% -    7%) 0.657
                       OrNotHighMed      264.98      (3.3%)      262.00      
(4.2%)   -1.1% (  -8% -    6%) 0.639
                          OrHighLow      184.70      (5.4%)      182.84      
(5.1%)   -1.0% ( -10% -   10%) 0.762
                         TermDTSort      102.46      (2.4%)      101.69      
(3.6%)   -0.8% (  -6% -    5%) 0.700
                    MedSloppyPhrase        3.36     (11.1%)        3.34     
(11.0%)   -0.5% ( -20% -   24%) 0.946
                            MedTerm      446.96      (6.0%)      445.14      
(5.7%)   -0.4% ( -11% -   11%) 0.912
        BrowseRandomLabelSSDVFacets        2.13      (4.9%)        2.12      
(5.0%)   -0.4% (  -9% -   10%) 0.904
                           Wildcard       72.04      (1.9%)       71.89      
(1.8%)   -0.2% (  -3% -    3%) 0.859
                            Respell       33.84      (0.7%)       33.78      
(0.9%)   -0.2% (  -1% -    1%) 0.730
                            LowTerm      348.43      (4.9%)      348.30      
(3.2%)   -0.0% (  -7% -    8%) 0.988
                    LowSloppyPhrase       13.10      (2.9%)       13.10      
(3.4%)    0.0% (  -6% -    6%) 0.999
               HighTermTitleBDVSort        4.63      (2.8%)        4.63      
(2.0%)    0.0% (  -4% -    5%) 0.986
            AndHighMedDayTaxoFacets       32.99      (0.5%)       33.00      
(1.3%)    0.0% (  -1% -    1%) 0.942
                       OrNotHighLow      323.13      (0.7%)      323.29      
(1.7%)    0.1% (  -2% -    2%) 0.951
                        LowSpanNear       43.88      (0.7%)       43.95      
(1.4%)    0.2% (  -1% -    2%) 0.823
                            Prefix3      263.17      (0.6%)      263.60      
(1.7%)    0.2% (  -2% -    2%) 0.839
           AndHighHighDayTaxoFacets        7.51      (1.4%)        7.53      
(1.5%)    0.2% (  -2% -    3%) 0.850
                             Fuzzy1       56.47      (1.2%)       56.58      
(1.1%)    0.2% (  -2% -    2%) 0.802
                MedIntervalsOrdered        6.49      (3.0%)        6.52      
(3.1%)    0.3% (  -5% -    6%) 0.865
                           PKLookup      122.42      (2.6%)      123.00      
(2.5%)    0.5% (  -4% -    5%) 0.774
                LowIntervalsOrdered       17.71      (3.3%)       17.79      
(3.3%)    0.5% (  -5% -    7%) 0.820
                          MedPhrase       73.52      (3.2%)       74.02      
(4.4%)    0.7% (  -6% -    8%) 0.781
             OrHighMedDayTaxoFacets        3.17      (4.7%)        3.19      
(5.8%)    0.7% (  -9% -   11%) 0.836
                       HighSpanNear        3.00      (0.8%)        3.02      
(2.5%)    0.7% (  -2% -    4%) 0.539
              HighTermDayOfYearSort      196.79      (1.1%)      198.64      
(0.9%)    0.9% (  -1% -    2%) 0.132
                         AndHighLow      256.98      (3.6%)      259.46      
(2.4%)    1.0% (  -4% -    7%) 0.617
                         HighPhrase       24.61      (4.0%)       24.89      
(4.3%)    1.1% (  -6% -    9%) 0.667
                        MedSpanNear       11.10      (1.5%)       11.23      
(3.6%)    1.1% (  -3% -    6%) 0.516
               BrowseDateSSDVFacets        0.73      (3.7%)        0.74      
(4.6%)    1.1% (  -6% -    9%) 0.665
                             Fuzzy2       55.52      (1.8%)       56.47      
(1.3%)    1.7% (  -1% -    4%) 0.077
                  HighTermTitleSort       53.70      (2.8%)       54.69      
(4.1%)    1.9% (  -4% -    8%) 0.402
                          LowPhrase        8.64      (5.1%)        8.89      
(5.0%)    2.9% (  -6% -   13%) 0.363
          BrowseDayOfYearSSDVFacets        2.77      (3.0%)        2.91     
(11.6%)    5.0% (  -9% -   20%) 0.346
                             IntNRQ       39.03      (6.5%)       41.43      
(7.8%)    6.2% (  -7% -   21%) 0.175
        BrowseRandomLabelTaxoFacets        2.60      (3.0%)        3.09     
(34.7%)   18.6% ( -18% -   58%) 0.234
               BrowseDateTaxoFacets        3.15      (2.7%)        3.83     
(38.6%)   21.6% ( -19% -   64%) 0.211
          BrowseDayOfYearTaxoFacets        3.15      (2.7%)        3.85     
(38.6%)   22.0% ( -18% -   65%) 0.204
              BrowseMonthTaxoFacets        3.24      (1.6%)        4.83     
(58.6%)   49.3% ( -10% -  111%) 0.060
   ```
   
   The result of benchmark does not seem to add some optimization. 🤔
   
   Should I add relevant test cases？
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

Reply via email to