jpountz opened a new pull request, #13658:
URL: https://github.com/apache/lucene/pull/13658

   This updates file formats to compute prefix sums by summing up 8 deltas per 
long at the same time if the number of bits per value is 4 or less, and 4 
deltas per long at the same time if the number of bits per value is between 5 
included and 11 included. Otherwise, we keep summing up 2 deltas per long like 
today.
   
   The `PostingDecodingUtil` was slightly modified due to the fact that more 
numbers of bits per value now need to apply different shifts to the input data. 
E.g. now that we store integers that require 5 bits per value as 16-bit 
integers under the hood rather than 8, we extract the first values by shifting 
by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the 
remaining bit per 16-bit integer.
   
   Micro benchmarks suggest a noticeable speedup for prefix sums and bits per 
value 2, 3, 4 when we now sum up 8 values at once instead of 2, and a minor 
speedup otherwise. For reference, wikibigall is frequently using 2, 3 or 4 bits 
per value on blocks of doc deltas of stop words ("the", "a", "1", etc.)
   
   Before (without enabling the vector module):
   
   ```
   Benchmark                                      (bpv)   Mode  Cnt   Score   
Error   Units
   PostingIndexInputBenchmark.decode                  2  thrpt   15  55,166 ± 
0,480  ops/us
   PostingIndexInputBenchmark.decode                  3  thrpt   15  51,257 ± 
0,086  ops/us
   PostingIndexInputBenchmark.decode                  4  thrpt   15  55,769 ± 
0,271  ops/us
   PostingIndexInputBenchmark.decode                  5  thrpt   15  50,789 ± 
0,062  ops/us
   PostingIndexInputBenchmark.decode                  6  thrpt   15  49,804 ± 
0,187  ops/us
   PostingIndexInputBenchmark.decode                  7  thrpt   15  47,552 ± 
0,522  ops/us
   PostingIndexInputBenchmark.decode                  8  thrpt   15  61,442 ± 
0,330  ops/us
   PostingIndexInputBenchmark.decode                  9  thrpt   15  40,030 ± 
0,084  ops/us
   PostingIndexInputBenchmark.decode                 10  thrpt   15  41,480 ± 
0,458  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      2  thrpt   15  20,808 ± 
0,063  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      3  thrpt   15  20,181 ± 
0,184  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      4  thrpt   15  20,931 ± 
0,141  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      5  thrpt   15  20,195 ± 
0,118  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      6  thrpt   15  20,304 ± 
0,185  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      7  thrpt   15  19,334 ± 
0,204  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      8  thrpt   15  21,579 ± 
0,071  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      9  thrpt   15  17,726 ± 
0,253  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum     10  thrpt   15  18,021 ± 
0,267  ops/us
   ```
   
   After (without enabling the vector module):
   
   ```
   Benchmark                                      (bpv)   Mode  Cnt   Score   
Error   Units
   PostingIndexInputBenchmark.decode                  2  thrpt   15  57,723 ± 
0,501  ops/us
   PostingIndexInputBenchmark.decode                  3  thrpt   15  52,526 ± 
0,255  ops/us
   PostingIndexInputBenchmark.decode                  4  thrpt   15  57,395 ± 
0,424  ops/us
   PostingIndexInputBenchmark.decode                  5  thrpt   15  50,513 ± 
0,076  ops/us
   PostingIndexInputBenchmark.decode                  6  thrpt   15  47,176 ± 
0,146  ops/us
   PostingIndexInputBenchmark.decode                  7  thrpt   15  44,838 ± 
0,138  ops/us
   PostingIndexInputBenchmark.decode                  8  thrpt   15  61,604 ± 
0,262  ops/us
   PostingIndexInputBenchmark.decode                  9  thrpt   15  37,737 ± 
0,057  ops/us
   PostingIndexInputBenchmark.decode                 10  thrpt   15  37,079 ± 
0,364  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      2  thrpt   15  30,823 ± 
0,052  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      3  thrpt   15  28,148 ± 
0,155  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      4  thrpt   15  30,545 ± 
0,059  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      5  thrpt   15  22,332 ± 
0,100  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      6  thrpt   15  22,240 ± 
0,029  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      7  thrpt   15  21,809 ± 
0,038  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      8  thrpt   15  23,279 ± 
0,376  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      9  thrpt   15  21,624 ± 
0,073  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum     10  thrpt   15  21,348 ± 
0,033  ops/us
   ```
   
   After (with enabling the vector module - Apple M3 - preferredBitSize=128):
   ```
   Benchmark                                      (bpv)   Mode  Cnt   Score   
Error   Units
   PostingIndexInputBenchmark.decode                  2  thrpt   15  51,391 ± 
0,171  ops/us
   PostingIndexInputBenchmark.decode                  3  thrpt   15  58,821 ± 
1,314  ops/us
   PostingIndexInputBenchmark.decode                  4  thrpt   15  65,562 ± 
0,412  ops/us
   PostingIndexInputBenchmark.decode                  5  thrpt   15  58,495 ± 
1,771  ops/us
   PostingIndexInputBenchmark.decode                  6  thrpt   15  57,617 ± 
0,201  ops/us
   PostingIndexInputBenchmark.decode                  7  thrpt   15  52,641 ± 
0,330  ops/us
   PostingIndexInputBenchmark.decode                  8  thrpt   15  61,280 ± 
0,488  ops/us
   PostingIndexInputBenchmark.decode                  9  thrpt   15  45,320 ± 
0,927  ops/us
   PostingIndexInputBenchmark.decode                 10  thrpt   15  46,437 ± 
0,131  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      2  thrpt   15  28,179 ± 
0,408  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      3  thrpt   15  29,989 ± 
0,261  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      4  thrpt   15  31,706 ± 
0,279  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      5  thrpt   15  21,305 ± 
0,031  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      6  thrpt   15  25,399 ± 
0,034  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      7  thrpt   15  25,336 ± 
0,081  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      8  thrpt   15  27,047 ± 
0,262  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum      9  thrpt   15  24,075 ± 
0,044  ops/us
   PostingIndexInputBenchmark.decodeAndPrefixSum     10  thrpt   15  24,549 ± 
0,273  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to