[PR] Stop bounding outer window. [lucene]

via GitHub Wed, 17 Jul 2024 14:05:28 -0700


jpountz opened a new pull request, #13582:
URL: https://github.com/apache/lucene/pull/13582


   Currently `MaxScoreBulkScorer` requires its "outer" window to be at least 
`WINDOW_SIZE`. The intuition there was that we should make sure we should use 
the whole range of the bit set that we are using to collect matches. The 
downside is that it may force us to use an upper level in the skip list that 
has worse upper bounds for the scores.
   
   luceneutil suggests that this is not a good trade-off: removing this 
requirement makes some queries a bit slower, but `OrHighMin` and `OrHighRare` 
much faster:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                    CountOrHighHigh       56.30     (36.8%)       52.86     
(28.4%)   -6.1% ( -52% -   93%) 0.623
                 Or2Terms2StopWords      162.62      (2.7%)      153.87      
(4.2%)   -5.4% ( -11% -    1%) 0.000
                     CountOrHighMed      103.04     (24.6%)       99.36     
(19.0%)   -3.6% ( -37% -   53%) 0.667
                             IntNRQ      344.56      (9.9%)      333.25     
(11.0%)   -3.3% ( -22% -   19%) 0.407
                        OrStopWords       32.05      (4.6%)       31.15      
(6.8%)   -2.8% ( -13% -    9%) 0.201
                            LowTerm      899.50      (4.9%)      877.72      
(5.7%)   -2.4% ( -12% -    8%) 0.227
                            MedTerm      506.67      (6.2%)      496.16      
(7.1%)   -2.1% ( -14% -   12%) 0.412
                           HighTerm      444.00      (6.1%)      435.85      
(7.3%)   -1.8% ( -14% -   12%) 0.467
                          CountTerm     9416.88      (3.5%)     9260.39      
(5.2%)   -1.7% (  -9% -    7%) 0.319
                       OrHighNotLow      367.96      (7.0%)      362.09      
(6.0%)   -1.6% ( -13% -   12%) 0.515
              HighTermDayOfYearSort      857.10      (3.8%)      849.83      
(3.8%)   -0.8% (  -8% -    7%) 0.556
                       OrHighNotMed      328.60      (7.0%)      325.94      
(6.1%)   -0.8% ( -12% -   13%) 0.743
                            Prefix3      287.21      (2.4%)      285.09      
(1.7%)   -0.7% (  -4% -    3%) 0.349
                           Or3Terms      156.94      (3.1%)      155.80      
(4.0%)   -0.7% (  -7% -    6%) 0.589
                             Fuzzy2       87.16      (1.3%)       86.59      
(1.1%)   -0.7% (  -3% -    1%) 0.156
                         OrHighHigh       75.93      (2.3%)       75.62      
(2.1%)   -0.4% (  -4% -    4%) 0.611
               HighTermTitleBDVSort       13.53      (3.3%)       13.48      
(6.8%)   -0.4% ( -10% -   10%) 0.859
                    CountAndHighMed      120.03      (2.9%)      119.60      
(1.9%)   -0.4% (  -5% -    4%) 0.705
                             Fuzzy1       89.56      (1.1%)       89.25      
(1.2%)   -0.3% (  -2% -    1%) 0.422
                       AndStopWords       29.15      (3.5%)       29.05      
(3.6%)   -0.3% (  -7% -    7%) 0.811
                          And3Terms      155.12      (2.2%)      154.91      
(2.5%)   -0.1% (  -4% -    4%) 0.883
                And2Terms2StopWords      149.66      (2.5%)      149.48      
(2.3%)   -0.1% (  -4% -    4%) 0.897
                            Respell       49.39      (1.4%)       49.34      
(1.3%)   -0.1% (  -2% -    2%) 0.826
                           Wildcard       77.63      (3.2%)       77.56      
(3.0%)   -0.1% (  -6% -    6%) 0.935
                       OrNotHighLow      964.64      (2.9%)      964.50      
(2.2%)   -0.0% (  -4% -    5%) 0.987
                      OrHighNotHigh      233.73      (7.3%)      233.73      
(6.5%)    0.0% ( -12% -   14%) 1.000
                   CountAndHighHigh       41.12      (2.4%)       41.15      
(2.2%)    0.1% (  -4% -    4%) 0.937
                  HighTermMonthSort     3589.46      (1.3%)     3594.78      
(2.7%)    0.1% (  -3% -    4%) 0.853
                         TermDTSort      360.82      (8.1%)      362.55      
(5.2%)    0.5% ( -11% -   14%) 0.852
                           PKLookup      285.81      (1.8%)      287.35      
(1.7%)    0.5% (  -2% -    4%) 0.418
                      OrNotHighHigh      265.21      (7.0%)      266.65      
(6.3%)    0.5% ( -11% -   14%) 0.830
                        AndHighHigh       68.85      (2.6%)       69.25      
(2.2%)    0.6% (  -4% -    5%) 0.527
                             Phrase       11.48      (3.1%)       11.56      
(4.1%)    0.6% (  -6% -    8%) 0.644
                         AndHighMed      147.82      (2.3%)      148.76      
(2.0%)    0.6% (  -3% -    5%) 0.438
                         AndHighLow      759.97      (3.8%)      765.97      
(2.2%)    0.8% (  -5% -    7%) 0.502
                  HighTermTitleSort      141.20      (2.9%)      142.32      
(3.1%)    0.8% (  -5% -    6%) 0.479
                       OrNotHighMed      377.50      (6.6%)      380.75      
(5.8%)    0.9% ( -10% -   14%) 0.714
                          OrHighMed      228.73      (2.9%)      231.40      
(2.4%)    1.2% (  -3% -    6%) 0.243
                        CountPhrase        3.14     (10.9%)        3.28     
(11.3%)    4.6% ( -15% -   30%) 0.275
                          OrHighLow      499.45      (2.5%)      613.28      
(2.9%)   22.8% (  17% -   28%) 0.000
                         OrHighRare      135.78      (7.4%)      228.35     
(14.3%)   68.2% (  43% -   96%) 0.000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Stop bounding outer window. [lucene]

Reply via email to