gf2121 opened a new pull request, #14361: URL: https://github.com/apache/lucene/pull/14361
This PR tries another way to implement the idea of https://github.com/apache/lucene/pull/13521, taking advantage of auto-vectorized loop to decode ints like we did in for bpv24 in https://github.com/apache/lucene/pull/14203. One thing need to be pointed out is that the remainder loop does not get vectorized (again!) since `512 / 3 = 170` is not a multiple of 8, then you see the`floorToMultipleOf8` trick . **Mac M2** ``` Benchmark Mode Cnt Score Error Units Decode21Benchmark.decode21Scalar thrpt 5 92.405 ± 0.521 ops/ms Decode21Benchmark.decode21Vector thrpt 5 108.325 ± 1.517 ops/ms Decode21Benchmark.decode21VectorFloorToMultipleOf8 thrpt 5 141.691 ± 3.948 ops/ms ``` **Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (AVX 512)** ``` Benchmark Mode Cnt Score Error Units Decode21Benchmark.decode21Scalar thrpt 5 29.134 ? 0.087 ops/ms Decode21Benchmark.decode21Scalar:asm thrpt NaN --- Decode21Benchmark.decode21Vector thrpt 5 45.180 ? 0.479 ops/ms Decode21Benchmark.decode21Vector:asm thrpt NaN --- Decode21Benchmark.decode21VectorFloorToMultipleOf8 thrpt 5 76.330 ? 2.858 ops/ms Decode21Benchmark.decode21VectorFloorToMultipleOf8:asm thrpt NaN --- ``` cc @expani who raised this neat idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org