[PR] Optimize decoding blocks of postings using the vector API. [lucene]

via GitHub Tue, 06 Aug 2024 07:47:42 -0700


jpountz opened a new pull request, #13636:
URL: https://github.com/apache/lucene/pull/13636


   Our postings use a layout that helps take advantage of Java's 
auto-vectorization to be reasonably fast to decode. But we can make it a bit 
faster by using explicit vectorization on MemorySegment:
    - vectorizing directly from the MemorySegment instead of first copying data 
into a long[],
    - decoding more longs than requested instead of forcing the last longs to 
be handled via scalar instructions.
   
   This approach only works when the `Directory` uses `MemorySegmentIndexInput` 
under the hood, ie. `MMapDirectory` on JDK 21+. The `ForUtilBenchmark` micro 
benchmark reports the following results:
   
   ```
   Before
   
   Benchmark                            (bpv)   Mode  Cnt   Score   Error   
Units
   ForUtilBenchmark.decode                  5  thrpt   15  36.244 ± 0.742  
ops/us
   ForUtilBenchmark.decode                  6  thrpt   15  35.406 ± 0.170  
ops/us
   ForUtilBenchmark.decode                  7  thrpt   15  33.396 ± 0.291  
ops/us
   ForUtilBenchmark.decode                  8  thrpt   15  41.064 ± 2.269  
ops/us
   ForUtilBenchmark.decode                  9  thrpt   15  30.288 ± 0.172  
ops/us
   ForUtilBenchmark.decode                 10  thrpt   15  31.203 ± 0.791  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      5  thrpt   15  19.421 ± 0.272  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      6  thrpt   15  18.932 ± 0.356  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      7  thrpt   15  16.824 ± 1.080  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      8  thrpt   15  21.085 ± 0.316  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      9  thrpt   15  15.874 ± 2.085  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum     10  thrpt   15  17.827 ± 0.210  
ops/us
   
   After
   
   Benchmark                            (bpv)   Mode  Cnt   Score   Error   
Units
   ForUtilBenchmark.decode                  5  thrpt   15  40.774 ± 1.170  
ops/us
   ForUtilBenchmark.decode                  6  thrpt   15  44.392 ± 0.748  
ops/us
   ForUtilBenchmark.decode                  7  thrpt   15  43.050 ± 0.586  
ops/us
   ForUtilBenchmark.decode                  8  thrpt   15  49.773 ± 0.376  
ops/us
   ForUtilBenchmark.decode                  9  thrpt   15  36.264 ± 0.434  
ops/us
   ForUtilBenchmark.decode                 10  thrpt   15  38.403 ± 1.388  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      5  thrpt   15  19.362 ± 0.573  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      6  thrpt   15  18.402 ± 3.128  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      7  thrpt   15  19.518 ± 0.680  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      8  thrpt   15  21.388 ± 0.228  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum      9  thrpt   15  18.126 ± 0.625  
ops/us
   ForUtilBenchmark.decodeAndPrefixSum     10  thrpt   15  19.161 ± 0.379  
ops/us
   ```
   
   And `luceneutil` on `wikibigall` reports the following (only look at queries 
with low p-values):
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntNRQ      132.60     (23.4%)      129.72     
(24.1%)   -2.2% ( -40% -   59%) 0.772
                          CountTerm     9419.14      (3.5%)     9377.10      
(3.1%)   -0.4% (  -6% -    6%) 0.669
                            Respell       55.22      (1.2%)       55.24      
(1.5%)    0.0% (  -2% -    2%) 0.931
                             Fuzzy2       71.83      (1.1%)       71.87      
(1.1%)    0.1% (  -2% -    2%) 0.871
                         TermDTSort      360.85      (6.8%)      361.40      
(5.6%)    0.2% ( -11% -   13%) 0.939
                  HighTermMonthSort     3491.38      (1.4%)     3499.56      
(2.4%)    0.2% (  -3% -    4%) 0.702
                             Fuzzy1       89.72      (1.1%)       90.06      
(1.2%)    0.4% (  -1% -    2%) 0.285
                    MedSloppyPhrase        7.70      (3.6%)        7.73      
(6.6%)    0.4% (  -9% -   10%) 0.791
                           PKLookup      288.99      (1.7%)      290.39      
(1.5%)    0.5% (  -2% -    3%) 0.332
              HighTermDayOfYearSort      829.90      (3.4%)      836.92      
(3.4%)    0.8% (  -5% -    7%) 0.436
                         HighPhrase       66.40      (2.8%)       66.97      
(5.1%)    0.9% (  -6% -    9%) 0.508
                            Prefix3      185.20      (3.1%)      186.81      
(2.8%)    0.9% (  -4% -    7%) 0.352
                  HighTermTitleSort      162.00      (4.1%)      163.43      
(5.0%)    0.9% (  -7% -   10%) 0.543
                   HighSloppyPhrase        6.91      (3.0%)        6.98      
(6.6%)    1.0% (  -8% -   10%) 0.521
                           Wildcard       53.89      (3.9%)       54.64      
(3.5%)    1.4% (  -5% -    9%) 0.237
                         OrHighRare      252.56      (9.3%)      257.10      
(9.6%)    1.8% ( -15% -   22%) 0.548
                MedIntervalsOrdered        3.33      (4.7%)        3.40      
(5.6%)    1.9% (  -8% -   12%) 0.259
                       HighSpanNear        4.60      (1.5%)        4.69      
(1.4%)    1.9% (  -1% -    4%) 0.000
                      OrNotHighHigh      221.99      (3.7%)      226.47      
(5.4%)    2.0% (  -6% -   11%) 0.164
                          OrHighLow      812.48      (2.0%)      828.98      
(2.3%)    2.0% (  -2% -    6%) 0.003
               HighIntervalsOrdered        1.36      (5.3%)        1.39      
(6.4%)    2.0% (  -9% -   14%) 0.272
                LowIntervalsOrdered        4.38      (4.1%)        4.47      
(4.5%)    2.1% (  -6% -   11%) 0.130
                      OrHighNotHigh      207.32      (4.1%)      211.83      
(5.6%)    2.2% (  -7% -   12%) 0.159
                       OrNotHighMed      334.67      (3.7%)      342.02      
(5.8%)    2.2% (  -7% -   12%) 0.152
                        MedSpanNear       14.99      (1.8%)       15.32      
(1.4%)    2.2% (  -1% -    5%) 0.000
                          MedPhrase       14.61      (3.2%)       14.97      
(4.6%)    2.5% (  -5% -   10%) 0.052
                          LowPhrase       71.73      (3.3%)       73.49      
(4.4%)    2.5% (  -5% -   10%) 0.046
                        AndHighHigh       70.54      (2.0%)       72.44      
(1.3%)    2.7% (   0% -    6%) 0.000
                         OrHighHigh       67.00      (1.7%)       68.80      
(1.6%)    2.7% (   0% -    6%) 0.000
                          OrHighMed      191.06      (1.9%)      196.57      
(1.8%)    2.9% (   0% -    6%) 0.000
                    LowSloppyPhrase       24.50      (2.9%)       25.27      
(4.1%)    3.2% (  -3% -   10%) 0.005
                         AndHighMed      151.19      (2.2%)      156.00      
(1.4%)    3.2% (   0% -    6%) 0.000
                        CountPhrase        3.18      (9.0%)        3.29      
(9.1%)    3.3% ( -13% -   23%) 0.244
                 Or2Terms2StopWords      160.27      (4.4%)      165.68      
(1.5%)    3.4% (  -2% -    9%) 0.001
                And2Terms2StopWords      157.18      (2.8%)      162.48      
(1.4%)    3.4% (   0% -    7%) 0.000
                       OrHighNotMed      301.42      (4.1%)      311.77      
(6.1%)    3.4% (  -6% -   14%) 0.038
                        LowSpanNear        9.83      (1.5%)       10.17      
(1.4%)    3.5% (   0% -    6%) 0.000
                   CountAndHighHigh       46.76      (1.3%)       48.48      
(2.3%)    3.7% (   0% -    7%) 0.000
               HighTermTitleBDVSort       11.40      (5.6%)       11.82      
(8.1%)    3.7% (  -9% -   18%) 0.092
                       OrHighNotLow      342.09      (4.6%)      355.83      
(7.0%)    4.0% (  -7% -   16%) 0.033
                          And3Terms      165.26      (3.0%)      171.96      
(1.6%)    4.1% (   0% -    8%) 0.000
                           Or3Terms      165.25      (4.5%)      171.96      
(1.5%)    4.1% (  -1% -   10%) 0.000
                         AndHighLow      993.19      (2.8%)     1034.27      
(2.8%)    4.1% (  -1% -   10%) 0.000
                       OrNotHighLow      989.09      (3.1%)     1030.58      
(3.5%)    4.2% (  -2% -   11%) 0.000
                       AndStopWords       29.62      (4.1%)       30.97      
(1.9%)    4.6% (  -1% -   11%) 0.000
                        OrStopWords       32.89      (7.0%)       34.41      
(2.6%)    4.6% (  -4% -   15%) 0.006
                            LowTerm      978.60      (3.0%)     1023.81      
(6.1%)    4.6% (  -4% -   14%) 0.002
                    CountAndHighMed      140.50      (1.6%)      147.18      
(2.6%)    4.8% (   0% -    9%) 0.000
                    CountOrHighHigh       57.94     (15.1%)       60.85     
(16.7%)    5.0% ( -23% -   43%) 0.319
                     CountOrHighMed      113.79     (11.2%)      120.21     
(13.2%)    5.6% ( -16% -   33%) 0.145
                           HighTerm      363.62      (5.0%)      384.13      
(8.2%)    5.6% (  -7% -   19%) 0.009
                            MedTerm      546.01      (4.2%)      580.61      
(7.9%)    6.3% (  -5% -   19%) 0.002
   ```
   
   ### Description
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Optimize decoding blocks of postings using the vector API. [lucene]

Reply via email to