RamakrishnaChilaka opened a new pull request, #15110: URL: https://github.com/apache/lucene/pull/15110
PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation. ### Description Rewrite splitInts() so the innermost loop walks the array contiguously instead of strided manner. Giving chance to HotSpot C2 to vectorise the shift-and-mask operations and also improves cache locality. No functional change - the temporary buffer `b` is only accessed inside this method and its layout order is irrelevant to callers. ### Performance comparsion on intel i5-13600k Full JMH results are in the [benchmark repo](https://github.com/RamakrishnaChilaka/postings-decoding-util). Speed ups: ``` Benchmark (count) (dec) Mode Cnt Score Error Units BitUnpackerBenchmark.baselineScalar 32 1 thrpt 25 1667.831 _ 5.508 ops/ms BitUnpackerBenchmark.baselineScalar 32 2 thrpt 25 3007.441 _ 54.965 ops/ms BitUnpackerBenchmark.baselineScalar 32 4 thrpt 25 5837.677 _ 50.676 ops/ms BitUnpackerBenchmark.baselineScalar 32 8 thrpt 25 10565.109 _ 133.238 ops/ms BitUnpackerBenchmark.baselineScalar 1024 1 thrpt 25 13.448 _ 0.036 ops/ms BitUnpackerBenchmark.baselineScalar 1024 2 thrpt 25 25.123 _ 0.024 ops/ms BitUnpackerBenchmark.baselineScalar 1024 4 thrpt 25 210.144 _ 2.174 ops/ms BitUnpackerBenchmark.baselineScalar 1024 8 thrpt 25 379.248 _ 9.067 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 32 1 thrpt 25 3265.182 _ 10.690 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 32 2 thrpt 25 6507.689 _ 19.246 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 32 4 thrpt 25 12516.050 _ 72.828 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 32 8 thrpt 25 25706.735 _ 105.216 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 1024 1 thrpt 25 132.753 _ 0.223 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 1024 2 thrpt 25 293.396 _ 0.527 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 1024 4 thrpt 25 597.617 _ 32.484 ops/ms BitUnpackerBenchmark.optimizedWithFastPath 1024 8 thrpt 25 1202.047 _ 38.059 ops/ms ``` Summarising: | count | dec | baseline | optimized | **factor** | | ----: | --: | -------: | --------: | ---------: | | 32 | 1 | 1 668 | 3 265 | **1.96×** | | 32 | 2 | 3 007 | 6 508 | **2.16×** | | 32 | 4 | 5 838 | 12 516 | **2.14×** | | 32 | 8 | 10 565 | 25 707 | **2.43×** | | 1024 | 1 | 13.4 | 132.8 | **9.9×** | | 1024 | 2 | 25.1 | 293.4 | **11.7×** | | 1024 | 4 | 210 | 597.6 | **2.85×** | | 1024 | 8 | 379 | 1 202 | **3.17×** | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org