[PR] Speed up findNextGEQ by aggresive stepping [lucene]

via GitHub Fri, 30 May 2025 04:32:32 -0700


HUSTERGS opened a new pull request, #14735:
URL: https://github.com/apache/lucene/pull/14735


   ### Description
   This PR propose to increase the step size of `findNextGEQ` to speed up 
adanving within a block, instead of force using AVX512, this implementation 
check two IntVector when target is hit.
   
   ### Benchmark result
   I ran the luceneutil `wikimediumall` corpus with two different configs
   all default config
   ```
    TaskQPS baseline      StdDevQPS my_modified_version      StdDev             
   Pct diff p-value
                  FilteredOrHighMed      129.21     (10.3%)      121.31      
(9.7%)   -6.1% ( -23% -   15%) 0.053
                    CountOrHighHigh      110.26     (14.7%)      105.70     
(13.2%)   -4.1% ( -27% -   27%) 0.350
                     CountOrHighMed      133.31     (13.5%)      128.59     
(11.5%)   -3.5% ( -25% -   24%) 0.372
                 FilteredOrHighHigh       47.14      (7.6%)       45.61      
(5.9%)   -3.3% ( -15% -   11%) 0.130
                    AndHighOrMedMed       73.67     (11.0%)       71.74     
(11.4%)   -2.6% ( -22% -   22%) 0.457
                   IntervalsOrdered       36.73      (8.9%)       35.78      
(8.3%)   -2.6% ( -18% -   16%) 0.341
                   FilteredOr3Terms       78.48      (7.9%)       76.46      
(9.9%)   -2.6% ( -18% -   16%) 0.361
            CountFilteredOrHighHigh       66.75      (6.5%)       65.09      
(8.9%)   -2.5% ( -16% -   13%) 0.314
                    CountAndHighMed      168.16     (14.5%)      164.67     
(15.0%)   -2.1% ( -27% -   31%) 0.656
         FilteredOr2Terms2StopWords       62.18      (7.2%)       60.89      
(7.9%)   -2.1% ( -15% -   14%) 0.386
                FilteredAndHighHigh       36.87      (7.2%)       36.11      
(6.4%)   -2.0% ( -14% -   12%) 0.344
                FilteredOrStopWords       26.24      (4.0%)       25.78      
(4.4%)   -1.8% (  -9% -    6%) 0.181
                   CountAndHighHigh      127.60     (11.0%)      125.52     
(13.9%)   -1.6% ( -23% -   26%) 0.681
                             IntNRQ       94.06      (8.8%)       92.65     
(11.7%)   -1.5% ( -20% -   20%) 0.647
             CountFilteredOrHighMed       83.79     (11.3%)       82.61      
(9.4%)   -1.4% ( -19% -   21%) 0.666
                     FilteredIntNRQ      276.48     (14.4%)      272.68     
(12.6%)   -1.4% ( -24% -   29%) 0.748
                               Term      334.73     (17.1%)      330.46     
(17.7%)   -1.3% ( -30% -   40%) 0.817
                     FilteredPhrase       32.38      (3.8%)       32.06      
(5.4%)   -1.0% (  -9% -    8%) 0.492
                    DismaxOrHighMed       90.22     (15.8%)       89.51     
(15.5%)   -0.8% ( -27% -   36%) 0.874
                     FilteredOrMany       12.59      (4.2%)       12.50      
(3.2%)   -0.7% (  -7% -    6%) 0.530
               FilteredAndStopWords       24.18      (2.9%)       24.07      
(4.4%)   -0.5% (  -7% -    7%) 0.692
                             Phrase       75.02      (8.6%)       74.69      
(8.9%)   -0.4% ( -16% -   18%) 0.872
                   DismaxOrHighHigh       97.71     (12.4%)       97.30     
(16.1%)   -0.4% ( -25% -   32%) 0.925
                            Respell       25.95      (7.9%)       25.90      
(7.3%)   -0.2% ( -14% -   16%) 0.939
                CountFilteredOrMany       19.39      (3.5%)       19.40      
(3.9%)    0.0% (  -7% -    7%) 0.986
                  CombinedOrHighMed       25.89      (5.0%)       26.00      
(3.7%)    0.4% (  -7% -    9%) 0.749
                CountFilteredIntNRQ       53.60      (6.7%)       53.89      
(8.4%)    0.5% ( -13% -   16%) 0.823
                        CountOrMany       21.13      (4.1%)       21.26      
(3.1%)    0.6% (  -6% -    8%) 0.574
                          And3Terms       82.31     (17.4%)       82.98     
(13.7%)    0.8% ( -25% -   38%) 0.868
                        CountPhrase       77.38     (10.1%)       78.10      
(8.2%)    0.9% ( -15% -   21%) 0.751
                          TermB1M1P      436.52     (15.9%)      440.69     
(13.9%)    1.0% ( -24% -   36%) 0.840
                            Prefix3      343.67     (12.8%)      347.40      
(9.4%)    1.1% ( -18% -   26%) 0.760
                       FilteredTerm       96.16     (10.5%)       97.39      
(5.6%)    1.3% ( -13% -   19%) 0.632
                       CombinedTerm       37.36      (6.2%)       37.85      
(7.5%)    1.3% ( -11% -   16%) 0.548
                 CombinedOrHighHigh       14.21      (6.4%)       14.41      
(7.9%)    1.4% ( -12% -   16%) 0.540
                             OrMany       11.84     (10.3%)       12.01     
(10.3%)    1.4% ( -17% -   24%) 0.667
                    FilteredPrefix3      188.23     (11.0%)      191.36     
(11.1%)    1.7% ( -18% -   26%) 0.634
                       SloppyPhrase        9.34      (6.7%)        9.50      
(6.9%)    1.7% ( -11% -   16%) 0.427
                 Or2Terms2StopWords       98.50     (13.0%)      100.26     
(12.6%)    1.8% ( -21% -   31%) 0.659
                   AndMedOrHighHigh       59.17      (9.0%)       60.29      
(8.0%)    1.9% ( -13% -   20%) 0.484
                CountFilteredPhrase       19.99      (3.9%)       20.50      
(3.9%)    2.5% (  -5% -   10%) 0.040
                             Fuzzy2       38.40     (10.0%)       39.49     
(13.0%)    2.8% ( -18% -   28%) 0.440
                 CombinedAndHighMed       65.98      (8.9%)       67.92      
(8.2%)    2.9% ( -13% -   22%) 0.278
                             Fuzzy1       42.53      (7.9%)       43.83      
(8.0%)    3.0% ( -11% -   20%) 0.226
                      TermMonthSort      522.43     (11.8%)      542.15     
(11.4%)    3.8% ( -17% -   30%) 0.305
                           PKLookup      122.68      (7.7%)      127.40      
(9.8%)    3.8% ( -12% -   23%) 0.168
                             Term1M      378.19     (17.8%)      392.87     
(15.9%)    3.9% ( -25% -   45%) 0.468
                         OrHighHigh       42.53     (28.8%)       44.19     
(23.9%)    3.9% ( -37% -   79%) 0.641
                        OrStopWords       39.01     (19.7%)       40.53     
(22.7%)    3.9% ( -32% -   57%) 0.560
                          CountTerm     3260.09      (7.6%)     3389.74      
(7.7%)    4.0% ( -10% -   20%) 0.102
                           Or3Terms       64.82     (16.5%)       67.58     
(13.7%)    4.3% ( -22% -   41%) 0.373
                      TermTitleSort       36.25     (12.7%)       37.86     
(13.4%)    4.5% ( -19% -   35%) 0.281
                          OrHighMed       79.76     (24.0%)       83.39     
(23.7%)    4.6% ( -34% -   68%) 0.546
        FilteredAnd2Terms2StopWords       47.78     (14.0%)       50.01     
(11.3%)    4.7% ( -18% -   34%) 0.244
                            Term100      377.48     (11.7%)      397.62     
(14.0%)    5.3% ( -18% -   35%) 0.191
                         OrHighRare      216.65     (15.2%)      228.37     
(11.2%)    5.4% ( -18% -   37%) 0.201
                CombinedAndHighHigh        9.31     (14.1%)        9.82     
(12.6%)    5.5% ( -18% -   37%) 0.190
                         TermDTSort       71.24     (10.0%)       75.20     
(11.2%)    5.6% ( -14% -   29%) 0.099
                         DismaxTerm      406.52     (11.7%)      429.63      
(7.5%)    5.7% ( -12% -   28%) 0.068
                And2Terms2StopWords       49.26     (14.5%)       52.24     
(14.0%)    6.1% ( -19% -   40%) 0.179
                           SpanNear       10.09      (7.1%)       10.71      
(6.8%)    6.1% (  -7% -   21%) 0.006
                  FilteredAnd3Terms      138.60     (15.6%)      147.87     
(14.2%)    6.7% ( -19% -   43%) 0.156
                           Wildcard       99.31     (14.0%)      106.77     
(13.8%)    7.5% ( -17% -   40%) 0.087
                             IntSet      205.88     (10.7%)      222.75     
(17.1%)    8.2% ( -17% -   40%) 0.069
                  TermDayOfYearSort       82.67     (12.6%)       89.69     
(13.9%)    8.5% ( -16% -   40%) 0.043
                         AndHighMed       98.55     (18.8%)      107.12     
(13.1%)    8.7% ( -19% -   50%) 0.090
                       AndStopWords       41.09     (24.1%)       44.89     
(17.4%)    9.3% ( -25% -   66%) 0.164
                            Term10K      412.49     (18.8%)      458.77     
(13.1%)   11.2% ( -17% -   53%) 0.029
                            TermB1M      357.58     (17.5%)      401.24     
(15.1%)   12.2% ( -17% -   54%) 0.018
                 FilteredAndHighMed       98.32     (15.3%)      110.66     
(14.8%)   12.6% ( -15% -   50%) 0.008
                        AndHighHigh       43.61     (23.7%)       50.86     
(21.4%)   16.6% ( -23% -   80%) 0.020
   ```
   with `-searchConcurrency 0` and set `taskCountPerCat` to 5
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                     FilteredOrMany        5.09      (1.8%)        4.91      
(2.1%)   -3.5% (  -7% -    0%) 0.000
                       FilteredTerm       90.26      (2.7%)       87.55      
(2.4%)   -3.0% (  -7% -    2%) 0.009
                 FilteredOrHighHigh       18.29      (1.4%)       17.77      
(1.0%)   -2.9% (  -5% -    0%) 0.000
                  FilteredOrHighMed       54.74      (1.3%)       53.58      
(1.5%)   -2.1% (  -4% -    0%) 0.001
                   FilteredOr3Terms       59.74      (1.2%)       58.48      
(2.3%)   -2.1% (  -5% -    1%) 0.010
                    CountAndHighMed       91.94      (2.3%)       90.00      
(3.6%)   -2.1% (  -7% -    3%) 0.115
                      TermMonthSort     2557.83      (4.7%)     2510.13      
(2.6%)   -1.9% (  -8% -    5%) 0.268
                     FilteredPhrase       13.08      (1.2%)       12.84      
(1.3%)   -1.8% (  -4% -    0%) 0.001
                CountFilteredPhrase       11.95      (2.2%)       11.76      
(2.5%)   -1.6% (  -6% -    3%) 0.129
         FilteredOr2Terms2StopWords       70.67      (1.0%)       69.58      
(1.7%)   -1.5% (  -4% -    1%) 0.016
                FilteredOrStopWords       11.07      (1.4%)       10.90      
(1.5%)   -1.5% (  -4% -    1%) 0.015
                             Phrase       10.02      (1.6%)        9.90      
(2.4%)   -1.3% (  -5% -    2%) 0.169
                CombinedAndHighHigh        6.26      (2.5%)        6.19      
(3.7%)   -1.2% (  -7% -    5%) 0.376
            CountFilteredOrHighHigh       25.50      (1.8%)       25.24      
(1.0%)   -1.0% (  -3% -    1%) 0.115
                FilteredAndHighHigh       12.87      (6.2%)       12.75      
(5.9%)   -1.0% ( -12% -   11%) 0.716
                 CombinedOrHighHigh        6.32      (2.6%)        6.26      
(3.0%)   -0.9% (  -6% -    4%) 0.450
             CountFilteredOrHighMed       30.05      (2.0%)       29.83      
(1.7%)   -0.7% (  -4% -    2%) 0.375
                CountFilteredIntNRQ       22.54      (3.0%)       22.37      
(1.9%)   -0.7% (  -5% -    4%) 0.521
                 CombinedAndHighMed       25.48      (2.7%)       25.32      
(4.5%)   -0.6% (  -7% -    6%) 0.705
                   CountAndHighHigh       61.76      (1.5%)       61.44      
(1.4%)   -0.5% (  -3% -    2%) 0.412
                           Wildcard       59.06      (3.8%)       58.77      
(4.0%)   -0.5% (  -7% -    7%) 0.779
                            Respell       44.94      (3.2%)       44.74      
(2.4%)   -0.5% (  -5% -    5%) 0.713
                     CountOrHighMed       96.16      (1.6%)       95.73      
(3.0%)   -0.5% (  -5% -    4%) 0.676
                CountFilteredOrMany        6.10      (1.8%)        6.08      
(2.2%)   -0.4% (  -4% -    3%) 0.644
                    CountOrHighHigh       64.03      (1.8%)       63.78      
(2.0%)   -0.4% (  -4% -    3%) 0.650
                    FilteredPrefix3       94.25      (3.2%)       93.94      
(3.2%)   -0.3% (  -6% -    6%) 0.816
                        CountOrMany        6.85      (2.1%)        6.82      
(2.3%)   -0.3% (  -4% -    4%) 0.741
               FilteredAndStopWords       10.20      (5.7%)       10.17      
(5.3%)   -0.3% ( -10% -   11%) 0.901
                    AndHighOrMedMed       18.21      (3.4%)       18.17      
(3.5%)   -0.2% (  -6% -    6%) 0.889
                             Fuzzy2       46.42      (4.4%)       46.37      
(2.8%)   -0.1% (  -6% -    7%) 0.954
                   IntervalsOrdered        2.99      (3.9%)        2.98      
(2.9%)   -0.0% (  -6% -    6%) 0.982
                            Prefix3      100.89      (3.0%)      100.88      
(2.9%)   -0.0% (  -5% -    6%) 0.994
                             Fuzzy1       51.43      (5.6%)       51.50      
(3.8%)    0.1% (  -8% -   10%) 0.951
                      TermTitleSort       72.49      (3.7%)       72.61      
(3.0%)    0.2% (  -6% -    7%) 0.914
                  CombinedOrHighMed       25.51      (3.1%)       25.56      
(3.8%)    0.2% (  -6% -    7%) 0.900
                             IntSet      388.85      (5.6%)      389.85      
(4.4%)    0.3% (  -9% -   10%) 0.909
                           PKLookup      186.01      (7.2%)      186.58      
(8.4%)    0.3% ( -14% -   17%) 0.931
                           SpanNear        3.02      (5.0%)        3.03      
(5.8%)    0.3% (  -9% -   11%) 0.899
                     FilteredIntNRQ       48.19      (2.5%)       48.35      
(2.0%)    0.3% (  -4% -    5%) 0.758
                         OrHighRare      120.88      (5.5%)      121.28      
(5.5%)    0.3% ( -10% -   11%) 0.892
                        CountPhrase        3.31      (3.8%)        3.32      
(3.1%)    0.4% (  -6% -    7%) 0.794
                       CombinedTerm       13.88      (3.2%)       13.94      
(4.9%)    0.4% (  -7% -    8%) 0.822
                             IntNRQ       48.37      (2.5%)       48.58      
(1.8%)    0.4% (  -3% -    4%) 0.661
                  TermDayOfYearSort      381.01      (2.6%)      383.39      
(1.5%)    0.6% (  -3% -    4%) 0.514
                         TermDTSort      181.20      (2.2%)      182.78      
(2.5%)    0.9% (  -3% -    5%) 0.414
                          CountTerm     7148.05      (7.0%)     7230.16      
(5.2%)    1.1% ( -10% -   14%) 0.676
                       SloppyPhrase        1.50      (4.2%)        1.54      
(2.9%)    2.6% (  -4% -   10%) 0.109
                             OrMany        5.28     (12.6%)        5.42      
(9.1%)    2.6% ( -16% -   27%) 0.599
                   AndMedOrHighHigh       19.49      (7.9%)       19.99      
(6.5%)    2.6% ( -10% -   18%) 0.419
                 FilteredAndHighMed       36.56     (11.4%)       37.78      
(7.5%)    3.4% ( -14% -   25%) 0.438
                         DismaxTerm      588.96     (11.4%)      610.47      
(9.5%)    3.7% ( -15% -   27%) 0.437
                   DismaxOrHighHigh       42.39     (13.9%)       43.95     
(11.0%)    3.7% ( -18% -   33%) 0.512
                    DismaxOrHighMed       61.95     (12.7%)       64.44     
(10.8%)    4.0% ( -17% -   31%) 0.446
        FilteredAnd2Terms2StopWords       64.02     (13.8%)       66.89      
(9.4%)    4.5% ( -16% -   32%) 0.395
                             Term1M      525.96     (15.8%)      551.13     
(13.3%)    4.8% ( -21% -   40%) 0.464
                          TermB1M1P      525.44     (16.0%)      552.18     
(13.4%)    5.1% ( -21% -   41%) 0.442
                            Term10K      525.63     (15.7%)      552.56     
(13.3%)    5.1% ( -20% -   40%) 0.431
                               Term      526.88     (16.2%)      554.16     
(13.0%)    5.2% ( -20% -   40%) 0.430
                  FilteredAnd3Terms       76.81     (18.4%)       80.97     
(13.8%)    5.4% ( -22% -   46%) 0.456
                            TermB1M      525.09     (15.7%)      554.15     
(13.1%)    5.5% ( -20% -   40%) 0.392
                 Or2Terms2StopWords       68.38     (16.1%)       72.24     
(11.8%)    5.7% ( -19% -   39%) 0.371
                            Term100      522.93     (15.7%)      553.49     
(13.4%)    5.8% ( -20% -   41%) 0.370
                           Or3Terms       68.30     (20.0%)       72.34     
(14.1%)    5.9% ( -23% -   49%) 0.444
                And2Terms2StopWords       69.76     (15.6%)       74.00     
(11.2%)    6.1% ( -17% -   38%) 0.317
                        OrStopWords        8.15     (21.2%)        8.66     
(16.3%)    6.1% ( -25% -   55%) 0.468
                       AndStopWords        7.81     (18.5%)        8.29     
(13.3%)    6.2% ( -21% -   46%) 0.391
                          And3Terms       72.63     (19.4%)       77.74     
(13.9%)    7.0% ( -22% -   50%) 0.352
                          OrHighMed       76.89     (23.2%)       83.35     
(18.4%)    8.4% ( -26% -   65%) 0.370
                         OrHighHigh       21.09     (24.9%)       23.01     
(19.8%)    9.1% ( -28% -   71%) 0.366
                         AndHighMed       58.17     (22.6%)       63.81     
(18.0%)    9.7% ( -25% -   64%) 0.288
                        AndHighHigh       21.72     (25.2%)       24.07     
(20.8%)   10.8% ( -28% -   75%) 0.295
   ```
   
   
   ### Findings
   Initially, I was playing around with `findNextGEQ` and `AdvanceBenchmark`, 
try to figure out if there is any space for optimization, and finnaly found I 
can increase the step size and manually expand the loop. In my machine, expand 
the loop from 1 to 4 will produce different resut (4 is best)
   ```
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  752.037 ± 33.398  ops/ms  (no 
expand, current implementation)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  625.892 ± 20.849  ops/ms  
(expand 2, proposed implementation by this PR)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  893.295 ± 17.879  ops/ms  
(expand 3)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  955.528 ± 20.036  ops/ms  
(expand 4)
   ```
   so I modify the code and ran luceneutil, but the result is not good as 
`AdvanceBenchmark` showed. At first, I just run the expand-3/4, because only 
3/4 have better performance than the current implementation. This might 
indicate the `AdvanceBenchmark` have some gap with real word search path.
   
   To be honest, the result is not ***THAT*** reliable to me, because the 
`p-value` is still far from 0, but on the other hand, this modification 
actually paid more cost when the target is close, because we need to check two 
vectors now, instead of one,  so the high `p-value` might be acceptable? Please 
let me know if I got anything wrong. :)
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Speed up findNextGEQ by aggresive stepping [lucene]

Reply via email to