Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

via GitHub Fri, 30 May 2025 08:54:55 -0700


HUSTERGS commented on PR #14735:
URL: https://github.com/apache/lucene/pull/14735#issuecomment-2922761475


   I added a single block pre-check, so if the target locate at the first 
block, we will not pay the double cost. This change seems quite useful, because 
it reduces the cost of common cases. And the worse case is also mitigated. The 
`AdvanceBenchmark` shows a better result (the test result was worse than the 
original implemetation without this check)
   ```
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  752.037 ± 33.398  ops/ms  (no 
expand, current implementation)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  625.892 ± 20.849  ops/ms  
(expand 2, previous proposed implementation)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  802.733 ± 25.096  ops/ms  
(expand 2, currently proposed implementation by this PR)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  893.295 ± 17.879  ops/ms  
(expand 3)
   AdvanceBenchmark.vectorUtilSearch  thrpt   15  955.528 ± 20.036  ops/ms  
(expand 4)
   ```
   the result from luceneutil also confirmed that, `taskCountPerCat` set to 5 
and concurrent search disabled:
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                    CountAndHighMed       93.29      (3.5%)       90.14      
(4.3%)   -3.4% ( -10% -    4%) 0.086
                     CountOrHighMed       97.01      (2.4%)       94.42      
(4.3%)   -2.7% (  -9% -    4%) 0.125
                FilteredAndHighHigh       13.18      (2.2%)       12.94      
(1.6%)   -1.8% (  -5% -    2%) 0.062
                     FilteredOrMany        5.10      (1.9%)        5.01      
(2.8%)   -1.7% (  -6% -    3%) 0.164
                            Prefix3       98.88      (2.1%)       97.33      
(4.5%)   -1.6% (  -7% -    5%) 0.371
                    FilteredPrefix3       92.03      (2.0%)       90.85      
(3.9%)   -1.3% (  -7% -    4%) 0.408
                   FilteredOr3Terms       59.68      (1.9%)       59.00      
(2.6%)   -1.1% (  -5% -    3%) 0.318
                       SloppyPhrase        1.54      (4.2%)        1.52      
(3.3%)   -1.1% (  -8% -    6%) 0.579
                    AndHighOrMedMed       17.85      (2.0%)       17.66      
(1.6%)   -1.0% (  -4% -    2%) 0.263
               FilteredAndStopWords       10.36      (2.1%)       10.25      
(1.6%)   -1.0% (  -4% -    2%) 0.288
                 CombinedAndHighMed       24.68      (3.8%)       24.44      
(2.9%)   -1.0% (  -7% -    5%) 0.554
                CombinedAndHighHigh        6.15      (2.7%)        6.10      
(1.9%)   -0.9% (  -5% -    3%) 0.457
                 FilteredOrHighHigh       18.16      (1.8%)       18.02      
(1.7%)   -0.8% (  -4% -    2%) 0.379
                  FilteredOrHighMed       54.23      (2.9%)       53.82      
(2.6%)   -0.8% (  -6% -    4%) 0.582
                        CountOrMany        6.85      (1.7%)        6.80      
(3.0%)   -0.7% (  -5% -    4%) 0.589
                      TermMonthSort     2508.14      (3.7%)     2494.67      
(3.7%)   -0.5% (  -7% -    7%) 0.770
                           PKLookup      185.70      (8.8%)      185.00      
(8.7%)   -0.4% ( -16% -   18%) 0.932
                      TermTitleSort       71.42      (3.2%)       71.20      
(3.3%)   -0.3% (  -6% -    6%) 0.851
                         TermDTSort      179.37      (4.6%)      178.92      
(2.1%)   -0.3% (  -6% -    6%) 0.887
                  TermDayOfYearSort      375.84      (1.6%)      375.20      
(0.7%)   -0.2% (  -2% -    2%) 0.786
                         OrHighRare      123.25      (4.9%)      123.10      
(2.1%)   -0.1% (  -6% -    7%) 0.948
         FilteredOr2Terms2StopWords       69.66      (5.5%)       69.60      
(5.6%)   -0.1% ( -10% -   11%) 0.977
                           Wildcard       57.13      (3.1%)       57.12      
(3.1%)   -0.0% (  -6% -    6%) 0.988
                        CountPhrase        3.25      (4.4%)        3.25      
(2.8%)   -0.0% (  -6% -    7%) 0.994
                       FilteredTerm       88.15      (3.3%)       88.15      
(4.2%)    0.0% (  -7% -    7%) 0.999
                    CountOrHighHigh       63.31      (2.4%)       63.34      
(2.9%)    0.0% (  -5% -    5%) 0.976
                FilteredOrStopWords       11.01      (3.3%)       11.03      
(2.0%)    0.2% (  -4% -    5%) 0.900
                     FilteredPhrase       12.75      (2.4%)       12.78      
(2.6%)    0.2% (  -4% -    5%) 0.880
                       CombinedTerm       13.47      (3.7%)       13.50      
(3.9%)    0.2% (  -7% -    8%) 0.918
                   IntervalsOrdered        2.95      (3.3%)        2.96      
(3.5%)    0.3% (  -6% -    7%) 0.871
                            Respell       43.81      (1.9%)       43.97      
(2.1%)    0.4% (  -3% -    4%) 0.703
                   CountAndHighHigh       60.80      (2.5%)       61.04      
(2.8%)    0.4% (  -4% -    5%) 0.763
                           SpanNear        3.10      (2.3%)        3.12      
(1.9%)    0.6% (  -3% -    4%) 0.552
                             IntNRQ       48.14      (1.8%)       48.46      
(2.4%)    0.7% (  -3% -    4%) 0.522
                CountFilteredIntNRQ       22.06      (1.2%)       22.21      
(1.5%)    0.7% (  -2% -    3%) 0.306
                          CountTerm     7031.70      (2.8%)     7085.76      
(5.2%)    0.8% (  -7% -    9%) 0.715
            CountFilteredOrHighHigh       25.07      (1.1%)       25.27      
(1.4%)    0.8% (  -1% -    3%) 0.194
                             IntSet      391.92      (4.6%)      395.19      
(5.4%)    0.8% (  -8% -   11%) 0.739
                CountFilteredOrMany        6.01      (1.8%)        6.06      
(2.2%)    0.9% (  -3% -    4%) 0.394
                     FilteredIntNRQ       47.96      (2.1%)       48.37      
(2.3%)    0.9% (  -3% -    5%) 0.443
             CountFilteredOrHighMed       29.55      (0.8%)       29.81      
(1.7%)    0.9% (  -1% -    3%) 0.196
                             Phrase        9.76      (2.7%)        9.87      
(2.5%)    1.1% (  -4% -    6%) 0.414
                             Fuzzy2       45.67      (3.3%)       46.25      
(4.1%)    1.3% (  -5% -    8%) 0.495
                CountFilteredPhrase       11.67      (3.4%)       11.82      
(3.9%)    1.3% (  -5% -    8%) 0.483
                             Fuzzy1       50.95      (4.0%)       51.97      
(3.5%)    2.0% (  -5% -    9%) 0.286
                  CombinedOrHighMed       24.10      (9.1%)       24.73      
(4.2%)    2.6% (  -9% -   17%) 0.464
                 CombinedOrHighHigh        5.99      (9.2%)        6.18      
(2.6%)    3.1% (  -7% -   16%) 0.355
                 FilteredAndHighMed       38.19      (8.4%)       39.39      
(2.1%)    3.1% (  -6% -   14%) 0.306
                   AndMedOrHighHigh       19.76      (7.8%)       20.66      
(2.3%)    4.6% (  -5% -   15%) 0.112
                             OrMany        5.44      (8.2%)        5.70      
(4.3%)    4.7% (  -7% -   18%) 0.149
        FilteredAnd2Terms2StopWords       66.23     (12.4%)       69.62      
(5.8%)    5.1% ( -11% -   26%) 0.290
                    DismaxOrHighMed       63.25     (11.6%)       66.58      
(3.5%)    5.3% (  -8% -   22%) 0.219
                         DismaxTerm      593.96      (8.5%)      625.85      
(6.5%)    5.4% (  -8% -   22%) 0.157
                And2Terms2StopWords       73.30     (12.9%)       77.34      
(8.0%)    5.5% ( -13% -   30%) 0.305
                   DismaxOrHighHigh       43.52     (12.1%)       46.09      
(3.1%)    5.9% (  -8% -   23%) 0.180
                            TermB1M      526.21     (13.7%)      561.74     
(12.2%)    6.8% ( -16% -   37%) 0.298
                            Term10K      527.47     (13.8%)      563.17     
(12.3%)    6.8% ( -16% -   38%) 0.300
                  FilteredAnd3Terms       80.07     (15.7%)       85.52      
(2.0%)    6.8% (  -9% -   29%) 0.224
                          TermB1M1P      529.07     (13.9%)      565.30     
(12.8%)    6.8% ( -17% -   38%) 0.305
                       AndStopWords        8.52     (14.5%)        9.10      
(2.9%)    6.9% (  -9% -   28%) 0.190
                 Or2Terms2StopWords       71.07     (14.9%)       76.05      
(7.5%)    7.0% ( -13% -   34%) 0.235
                             Term1M      526.93     (13.9%)      565.31     
(11.8%)    7.3% ( -16% -   38%) 0.258
                            Term100      525.57     (13.3%)      563.91     
(12.0%)    7.3% ( -15% -   37%) 0.249
                          And3Terms       78.11     (15.0%)       83.90      
(1.8%)    7.4% (  -8% -   28%) 0.167
                               Term      526.25     (13.5%)      566.72     
(12.1%)    7.7% ( -15% -   38%) 0.230
                           Or3Terms       72.68     (16.4%)       78.71      
(1.6%)    8.3% (  -8% -   31%) 0.155
                        OrStopWords        8.89     (19.6%)        9.76      
(3.1%)    9.7% ( -10% -   40%) 0.165
                         AndHighMed       61.59     (18.7%)       67.79      
(2.1%)   10.1% (  -9% -   38%) 0.131
                          OrHighMed       80.86     (19.6%)       89.47      
(2.8%)   10.6% (  -9% -   41%) 0.129
                         OrHighHigh       22.38     (21.7%)       24.98      
(1.6%)   11.6% (  -9% -   44%) 0.131
                        AndHighHigh       23.37     (21.6%)       26.10      
(1.7%)   11.7% (  -9% -   44%) 0.128
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

Reply via email to