jpountz opened a new pull request, #13658: URL: https://github.com/apache/lucene/pull/13658
This updates file formats to compute prefix sums by summing up 8 deltas per long at the same time if the number of bits per value is 4 or less, and 4 deltas per long at the same time if the number of bits per value is between 5 included and 11 included. Otherwise, we keep summing up 2 deltas per long like today. The `PostingDecodingUtil` was slightly modified due to the fact that more numbers of bits per value now need to apply different shifts to the input data. E.g. now that we store integers that require 5 bits per value as 16-bit integers under the hood rather than 8, we extract the first values by shifting by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the remaining bit per 16-bit integer. Micro benchmarks suggest a noticeable speedup for prefix sums and bits per value 2, 3, 4 when we now sum up 8 values at once instead of 2, and a minor speedup otherwise. For reference, wikibigall is frequently using 2, 3 or 4 bits per value on blocks of doc deltas of stop words ("the", "a", "1", etc.) Before (without enabling the vector module): ``` Benchmark (bpv) Mode Cnt Score Error Units PostingIndexInputBenchmark.decode 2 thrpt 15 55,166 ± 0,480 ops/us PostingIndexInputBenchmark.decode 3 thrpt 15 51,257 ± 0,086 ops/us PostingIndexInputBenchmark.decode 4 thrpt 15 55,769 ± 0,271 ops/us PostingIndexInputBenchmark.decode 5 thrpt 15 50,789 ± 0,062 ops/us PostingIndexInputBenchmark.decode 6 thrpt 15 49,804 ± 0,187 ops/us PostingIndexInputBenchmark.decode 7 thrpt 15 47,552 ± 0,522 ops/us PostingIndexInputBenchmark.decode 8 thrpt 15 61,442 ± 0,330 ops/us PostingIndexInputBenchmark.decode 9 thrpt 15 40,030 ± 0,084 ops/us PostingIndexInputBenchmark.decode 10 thrpt 15 41,480 ± 0,458 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 2 thrpt 15 20,808 ± 0,063 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 3 thrpt 15 20,181 ± 0,184 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 4 thrpt 15 20,931 ± 0,141 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 5 thrpt 15 20,195 ± 0,118 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 6 thrpt 15 20,304 ± 0,185 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 7 thrpt 15 19,334 ± 0,204 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 8 thrpt 15 21,579 ± 0,071 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 9 thrpt 15 17,726 ± 0,253 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 10 thrpt 15 18,021 ± 0,267 ops/us ``` After (without enabling the vector module): ``` Benchmark (bpv) Mode Cnt Score Error Units PostingIndexInputBenchmark.decode 2 thrpt 15 57,723 ± 0,501 ops/us PostingIndexInputBenchmark.decode 3 thrpt 15 52,526 ± 0,255 ops/us PostingIndexInputBenchmark.decode 4 thrpt 15 57,395 ± 0,424 ops/us PostingIndexInputBenchmark.decode 5 thrpt 15 50,513 ± 0,076 ops/us PostingIndexInputBenchmark.decode 6 thrpt 15 47,176 ± 0,146 ops/us PostingIndexInputBenchmark.decode 7 thrpt 15 44,838 ± 0,138 ops/us PostingIndexInputBenchmark.decode 8 thrpt 15 61,604 ± 0,262 ops/us PostingIndexInputBenchmark.decode 9 thrpt 15 37,737 ± 0,057 ops/us PostingIndexInputBenchmark.decode 10 thrpt 15 37,079 ± 0,364 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 2 thrpt 15 30,823 ± 0,052 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 3 thrpt 15 28,148 ± 0,155 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 4 thrpt 15 30,545 ± 0,059 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 5 thrpt 15 22,332 ± 0,100 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 6 thrpt 15 22,240 ± 0,029 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 7 thrpt 15 21,809 ± 0,038 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 8 thrpt 15 23,279 ± 0,376 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 9 thrpt 15 21,624 ± 0,073 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 10 thrpt 15 21,348 ± 0,033 ops/us ``` After (with enabling the vector module - Apple M3 - preferredBitSize=128): ``` Benchmark (bpv) Mode Cnt Score Error Units PostingIndexInputBenchmark.decode 2 thrpt 15 51,391 ± 0,171 ops/us PostingIndexInputBenchmark.decode 3 thrpt 15 58,821 ± 1,314 ops/us PostingIndexInputBenchmark.decode 4 thrpt 15 65,562 ± 0,412 ops/us PostingIndexInputBenchmark.decode 5 thrpt 15 58,495 ± 1,771 ops/us PostingIndexInputBenchmark.decode 6 thrpt 15 57,617 ± 0,201 ops/us PostingIndexInputBenchmark.decode 7 thrpt 15 52,641 ± 0,330 ops/us PostingIndexInputBenchmark.decode 8 thrpt 15 61,280 ± 0,488 ops/us PostingIndexInputBenchmark.decode 9 thrpt 15 45,320 ± 0,927 ops/us PostingIndexInputBenchmark.decode 10 thrpt 15 46,437 ± 0,131 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 2 thrpt 15 28,179 ± 0,408 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 3 thrpt 15 29,989 ± 0,261 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 4 thrpt 15 31,706 ± 0,279 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 5 thrpt 15 21,305 ± 0,031 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 6 thrpt 15 25,399 ± 0,034 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 7 thrpt 15 25,336 ± 0,081 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 8 thrpt 15 27,047 ± 0,262 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 9 thrpt 15 24,075 ± 0,044 ops/us PostingIndexInputBenchmark.decodeAndPrefixSum 10 thrpt 15 24,549 ± 0,273 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org