RamakrishnaChilaka opened a new pull request, #15198: URL: https://github.com/apache/lucene/pull/15198
This PR optimizes the expand8 routine by leveraging the JDK Vector API. #### Benchmarks I have validated performance using a standalone benchmark (see [postings_expand_benchmark](https://github.com/RamakrishnaChilaka/-postings_expand_benchmark)) for block_size: 256. Key take-aways are as follows. | Benchmark | Mode | Cnt | Score | Error | Units | |-------------------------------|------|-----|-----------|---------|-------| | expand16 (Scalar) | thrpt| 5 | 112.842 | ± 0.221 | ops/us | | expand16 (Vector) | thrpt| 5 | 105.594 | ± 1.307 | ops/us | | expand8 (Scalar) | thrpt| 5 | 66.726 | ± 0.452 | ops/us | | expand8 (Vector) | thrpt| 5 | 105.821 | ± 0.272 | ops/us | * **expand8**: Vectorized version is ~59% faster than scalar (66.7 → 105.8 ops/us). * **expand16**: Scalar slightly outperforms vector (112.8 vs 105.6 ops/us). ### Lucene Microbenchmarks ``` baseline Benchmark (bpv) Mode Cnt Score Error Units PostingIndexInputBenchmark.decode 2 thrpt 15 35.409 ± 0.120 ops/us PostingIndexInputBenchmark.decode 3 thrpt 15 29.128 ± 0.017 ops/us PostingIndexInputBenchmark.decode 4 thrpt 15 41.492 ± 0.305 ops/us PostingIndexInputBenchmark.decode 5 thrpt 15 32.205 ± 0.350 ops/us PostingIndexInputBenchmark.decode 6 thrpt 15 31.237 ± 0.245 ops/us PostingIndexInputBenchmark.decode 7 thrpt 15 29.984 ± 0.582 ops/us PostingIndexInputBenchmark.decode 8 thrpt 15 56.366 ± 0.134 ops/us PostingIndexInputBenchmark.decode 9 thrpt 15 22.802 ± 0.077 ops/us PostingIndexInputBenchmark.decode 10 thrpt 15 23.502 ± 0.037 ops/us PostingIndexInputBenchmark.decodeVector 2 thrpt 15 53.151 ± 0.070 ops/us PostingIndexInputBenchmark.decodeVector 3 thrpt 15 48.863 ± 1.455 ops/us PostingIndexInputBenchmark.decodeVector 4 thrpt 15 54.284 ± 2.195 ops/us PostingIndexInputBenchmark.decodeVector 5 thrpt 15 39.302 ± 0.659 ops/us PostingIndexInputBenchmark.decodeVector 6 thrpt 15 38.414 ± 0.830 ops/us PostingIndexInputBenchmark.decodeVector 7 thrpt 15 39.609 ± 0.551 ops/us PostingIndexInputBenchmark.decodeVector 8 thrpt 15 56.373 ± 0.118 ops/us PostingIndexInputBenchmark.decodeVector 9 thrpt 15 27.295 ± 0.351 ops/us PostingIndexInputBenchmark.decodeVector 10 thrpt 15 30.058 ± 0.172 ops/us contender Benchmark (bpv) Mode Cnt Score Error Units PostingIndexInputBenchmark.decode 2 thrpt 15 35.238 ± 0.209 ops/us PostingIndexInputBenchmark.decode 3 thrpt 15 29.214 ± 0.098 ops/us PostingIndexInputBenchmark.decode 4 thrpt 15 41.559 ± 0.580 ops/us PostingIndexInputBenchmark.decode 5 thrpt 15 32.543 ± 0.175 ops/us PostingIndexInputBenchmark.decode 6 thrpt 15 31.323 ± 0.061 ops/us PostingIndexInputBenchmark.decode 7 thrpt 15 29.525 ± 0.315 ops/us PostingIndexInputBenchmark.decode 8 thrpt 15 52.348 ± 0.079 ops/us PostingIndexInputBenchmark.decode 9 thrpt 15 24.919 ± 0.056 ops/us PostingIndexInputBenchmark.decode 10 thrpt 15 26.581 ± 0.049 ops/us PostingIndexInputBenchmark.decodeVector 2 thrpt 15 71.223 ± 6.921 ops/us PostingIndexInputBenchmark.decodeVector 3 thrpt 15 53.237 ± 1.962 ops/us PostingIndexInputBenchmark.decodeVector 4 thrpt 15 73.437 ± 0.284 ops/us PostingIndexInputBenchmark.decodeVector 5 thrpt 15 41.201 ± 2.067 ops/us PostingIndexInputBenchmark.decodeVector 6 thrpt 15 46.622 ± 0.289 ops/us PostingIndexInputBenchmark.decodeVector 7 thrpt 15 45.505 ± 1.044 ops/us PostingIndexInputBenchmark.decodeVector 8 thrpt 15 58.368 ± 0.977 ops/us PostingIndexInputBenchmark.decodeVector 9 thrpt 15 27.243 ± 0.358 ops/us PostingIndexInputBenchmark.decodeVector 10 thrpt 15 30.059 ± 0.105 ops/us ``` ### Summary bpv -9,10 uses primitive size as 16, hence no change in performance. | bpv | baseline vector (ops/μs) | contender vector (ops/μs) | Δ | | ----: | -----------------------: | ------------------------: | ---------: | | 2 | 53.2 | 71.2 | +33.8 % | | 3 | 48.9 | 53.2 | +8.8 % | | 4 | 54.3 | 73.4 | +35.2 % | | 5 | 39.3 | 41.2 | +4.8 % | | 6 | 38.4 | 46.6 | +21.4 % | | 7 | 39.6 | 45.5 | +14.9 % | | 8 | 56.3 | 58.4 | +3.7 % | | 9 | 27.3 | 27.2 | –0.4 % | | 10 | 30.1 | 30.1 | 0.0 % | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
