[PR] Speed up advancing within a block, take 2. [lucene]

via GitHub Fri, 25 Oct 2024 10:03:44 -0700


jpountz opened a new pull request, #13958:
URL: https://github.com/apache/lucene/pull/13958


   PR #13692 tried to speed up advancing by using branchless binary search, but 
while this yielded a speedup on my machine, this yielded a slowdown on nightly 
benchmarks.
   
   This PR tries a different approach using vectorization. Experimentation 
suggests that it slows down a bit queries when advancing often goes to the very 
next doc ID, such as term queries and `OrHighNotXXX` tasks. But it speeds up 
queries that advance to the next few doc IDs, such as `AndHighHigh`. I think 
that this is a good trade-off since it slows down some plenty fast queries in 
exchange for a speedup with some more expensive queries.
   
   Here is a `luceneutil` run on `wikibigall` with `-searchConcurrency 0`:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                      OrHighNotHigh      302.78      (2.4%)      283.75      
(2.9%)   -6.3% ( -11% -   -1%) 0.000
                       OrHighNotMed      384.69      (3.0%)      363.33      
(2.8%)   -5.6% ( -10% -    0%) 0.000
                            MedTerm      564.86      (2.2%)      537.04      
(3.5%)   -4.9% ( -10% -    0%) 0.000
                            LowTerm     1014.02      (2.2%)      967.37      
(3.6%)   -4.6% ( -10% -    1%) 0.000
                       OrHighNotLow      446.38      (3.4%)      427.10      
(3.3%)   -4.3% ( -10% -    2%) 0.000
                           HighTerm      485.41      (1.9%)      464.49      
(3.2%)   -4.3% (  -9% -    0%) 0.000
                      OrNotHighHigh      229.78      (2.4%)      221.51      
(3.1%)   -3.6% (  -8% -    1%) 0.000
                       OrNotHighMed      396.63      (2.7%)      382.41      
(3.1%)   -3.6% (  -9% -    2%) 0.000
                            Prefix3      145.65      (3.6%)      142.39      
(3.7%)   -2.2% (  -9% -    5%) 0.051
                             IntNRQ      158.04      (4.7%)      154.77      
(5.6%)   -2.1% ( -11% -    8%) 0.205
                          CountTerm     8320.96      (3.2%)     8198.56      
(4.7%)   -1.5% (  -9% -    6%) 0.246
                           PKLookup      273.35      (3.6%)      269.71      
(5.2%)   -1.3% (  -9% -    7%) 0.345
                           Wildcard       83.30      (3.4%)       82.28      
(3.1%)   -1.2% (  -7% -    5%) 0.234
                  HighTermMonthSort     3235.98      (3.1%)     3198.04      
(2.9%)   -1.2% (  -6% -    4%) 0.215
                  HighTermTitleSort      148.94      (2.5%)      148.38      
(2.6%)   -0.4% (  -5% -    4%) 0.638
                     CountOrHighMed      104.51      (2.0%)      104.22      
(1.7%)   -0.3% (  -3% -    3%) 0.640
               HighTermTitleBDVSort       14.67      (5.3%)       14.64      
(5.9%)   -0.2% ( -10% -   11%) 0.899
                       AndStopWords       30.68      (3.0%)       30.66      
(2.7%)   -0.1% (  -5% -    5%) 0.941
                    CountOrHighHigh       50.17      (2.0%)       50.19      
(1.9%)    0.0% (  -3% -    3%) 0.947
                         OrHighRare      273.82      (4.5%)      273.96      
(3.8%)    0.0% (  -7% -    8%) 0.971
                         TermDTSort      353.37      (6.4%)      354.23      
(6.7%)    0.2% ( -12% -   14%) 0.907
                             Fuzzy1       77.85      (2.6%)       78.12      
(2.0%)    0.3% (  -4% -    4%) 0.633
                             Fuzzy2       73.23      (2.5%)       73.50      
(1.9%)    0.4% (  -3% -    4%) 0.594
              HighTermDayOfYearSort      836.62      (3.1%)      841.07      
(4.0%)    0.5% (  -6% -    7%) 0.639
                And2Terms2StopWords      154.49      (1.8%)      155.41      
(2.1%)    0.6% (  -3% -    4%) 0.340
                          OrHighLow      771.90      (2.0%)      778.20      
(2.2%)    0.8% (  -3% -    5%) 0.217
                          And3Terms      167.63      (2.3%)      169.23      
(2.2%)    1.0% (  -3% -    5%) 0.176
                        OrStopWords       33.99      (4.6%)       34.39      
(4.1%)    1.2% (  -7% -   10%) 0.388
                    CountAndHighMed      148.01      (2.4%)      149.91      
(1.0%)    1.3% (  -2% -    4%) 0.025
                 Or2Terms2StopWords      156.93      (2.8%)      159.21      
(3.0%)    1.5% (  -4% -    7%) 0.117
                        AndHighHigh       67.06      (1.3%)       68.07      
(1.6%)    1.5% (  -1% -    4%) 0.001
                             OrMany       18.67      (2.9%)       18.96      
(2.9%)    1.5% (  -4% -    7%) 0.089
                         AndHighMed      185.02      (1.6%)      189.06      
(1.3%)    2.2% (   0% -    5%) 0.000
                         AndHighLow      948.34      (2.6%)      970.47      
(2.6%)    2.3% (  -2% -    7%) 0.004
                         OrHighHigh       68.42      (1.4%)       70.08      
(1.3%)    2.4% (   0% -    5%) 0.000
                           Or3Terms      166.47      (2.7%)      171.10      
(3.1%)    2.8% (  -2% -    8%) 0.003
                       OrNotHighLow      964.69      (3.1%)      994.46      
(3.3%)    3.1% (  -3% -    9%) 0.002
                          OrHighMed      222.32      (2.1%)      230.93      
(1.5%)    3.9% (   0% -    7%) 0.000
                   CountAndHighHigh       48.88      (2.4%)       52.87      
(1.3%)    8.2% (   4% -   12%) 0.000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Speed up advancing within a block, take 2. [lucene]

Reply via email to