gf2121 commented on PR #14910: URL: https://github.com/apache/lucene/pull/14910#issuecomment-3066683054
JMH results with the vectorized implementations: ``` Benchmark (bitCount) Mode Cnt Score Error Units BitsetToArrayBenchmark.dense 5 thrpt 5 9.583 ± 0.238 ops/us BitsetToArrayBenchmark.dense 10 thrpt 5 6.926 ± 0.151 ops/us BitsetToArrayBenchmark.dense 20 thrpt 5 4.597 ± 0.042 ops/us BitsetToArrayBenchmark.dense 30 thrpt 5 3.420 ± 0.033 ops/us BitsetToArrayBenchmark.dense 40 thrpt 5 3.766 ± 0.013 ops/us BitsetToArrayBenchmark.dense 50 thrpt 5 5.299 ± 0.126 ops/us BitsetToArrayBenchmark.dense 60 thrpt 5 8.991 ± 0.223 ops/us BitsetToArrayBenchmark.denseBranchLess 5 thrpt 5 13.520 ± 0.132 ops/us BitsetToArrayBenchmark.denseBranchLess 10 thrpt 5 13.440 ± 0.575 ops/us BitsetToArrayBenchmark.denseBranchLess 20 thrpt 5 13.521 ± 0.289 ops/us BitsetToArrayBenchmark.denseBranchLess 30 thrpt 5 13.488 ± 0.641 ops/us BitsetToArrayBenchmark.denseBranchLess 40 thrpt 5 13.501 ± 0.375 ops/us BitsetToArrayBenchmark.denseBranchLess 50 thrpt 5 13.555 ± 0.384 ops/us BitsetToArrayBenchmark.denseBranchLess 60 thrpt 5 13.524 ± 0.498 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 5 thrpt 5 8.521 ± 0.120 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 10 thrpt 5 6.315 ± 0.164 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 20 thrpt 5 11.531 ± 0.176 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 30 thrpt 5 11.493 ± 0.255 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 40 thrpt 5 11.535 ± 0.018 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 50 thrpt 5 11.539 ± 0.084 ops/us BitsetToArrayBenchmark.denseBranchLessCmov 60 thrpt 5 9.100 ± 0.017 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 5 thrpt 5 15.428 ± 0.155 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 10 thrpt 5 15.424 ± 0.282 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 20 thrpt 5 15.375 ± 0.341 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 30 thrpt 5 15.395 ± 0.121 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 40 thrpt 5 15.308 ± 0.407 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 50 thrpt 5 15.322 ± 0.174 ops/us BitsetToArrayBenchmark.denseBranchLessParallel 60 thrpt 5 15.439 ± 0.064 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 5 thrpt 5 15.795 ± 0.380 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 10 thrpt 5 15.827 ± 0.228 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 20 thrpt 5 15.672 ± 0.991 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 30 thrpt 5 15.789 ± 0.327 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 40 thrpt 5 15.764 ± 0.350 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 50 thrpt 5 15.725 ± 0.393 ops/us BitsetToArrayBenchmark.denseBranchLessUnrolling 60 thrpt 5 15.868 ± 0.028 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 5 thrpt 5 25.889 ± 0.471 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 10 thrpt 5 25.975 ± 0.129 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 20 thrpt 5 25.852 ± 0.299 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 30 thrpt 5 25.888 ± 0.371 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 40 thrpt 5 25.708 ± 1.028 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 50 thrpt 5 25.856 ± 0.612 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized 60 thrpt 5 25.931 ± 0.144 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 5 thrpt 5 28.221 ± 0.545 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 10 thrpt 5 28.306 ± 0.209 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 20 thrpt 5 26.827 ± 1.704 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 30 thrpt 5 27.027 ± 0.214 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 40 thrpt 5 26.504 ± 0.909 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 50 thrpt 5 25.725 ± 0.084 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512 60 thrpt 5 25.495 ± 1.521 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 5 thrpt 5 1.137 ± 0.473 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 10 thrpt 5 0.856 ± 0.312 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 20 thrpt 5 0.171 ± 0.091 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 30 thrpt 5 0.159 ± 0.072 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 40 thrpt 5 0.097 ± 0.042 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 50 thrpt 5 0.069 ± 0.021 ops/us BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2 60 thrpt 5 0.068 ± 0.041 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 5 thrpt 5 20.310 ± 0.139 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 10 thrpt 5 20.125 ± 0.352 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 20 thrpt 5 19.961 ± 0.653 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 30 thrpt 5 20.025 ± 1.040 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 40 thrpt 5 20.051 ± 0.556 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 50 thrpt 5 20.128 ± 0.131 ops/us BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2 60 thrpt 5 19.769 ± 2.266 ops/us BitsetToArrayBenchmark.denseInvert 5 thrpt 5 19.958 ± 0.355 ops/us BitsetToArrayBenchmark.denseInvert 10 thrpt 5 13.497 ± 0.826 ops/us BitsetToArrayBenchmark.denseInvert 20 thrpt 5 6.995 ± 0.093 ops/us BitsetToArrayBenchmark.denseInvert 30 thrpt 5 4.579 ± 0.035 ops/us BitsetToArrayBenchmark.denseInvert 40 thrpt 5 4.447 ± 0.028 ops/us BitsetToArrayBenchmark.denseInvert 50 thrpt 5 4.082 ± 0.051 ops/us BitsetToArrayBenchmark.denseInvert 60 thrpt 5 6.732 ± 0.145 ops/us BitsetToArrayBenchmark.forLoop 5 thrpt 5 26.332 ± 0.080 ops/us BitsetToArrayBenchmark.forLoop 10 thrpt 5 21.765 ± 0.029 ops/us BitsetToArrayBenchmark.forLoop 20 thrpt 5 15.878 ± 0.247 ops/us BitsetToArrayBenchmark.forLoop 30 thrpt 5 12.606 ± 0.251 ops/us BitsetToArrayBenchmark.forLoop 40 thrpt 5 10.440 ± 0.036 ops/us BitsetToArrayBenchmark.forLoop 50 thrpt 5 8.875 ± 0.164 ops/us BitsetToArrayBenchmark.forLoop 60 thrpt 5 7.735 ± 0.171 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 5 thrpt 5 26.018 ± 0.586 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 10 thrpt 5 21.031 ± 0.364 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 20 thrpt 5 15.683 ± 0.266 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 30 thrpt 5 12.502 ± 0.056 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 40 thrpt 5 10.330 ± 0.212 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 50 thrpt 5 8.842 ± 0.020 ops/us BitsetToArrayBenchmark.forLoopManualUnrolling 60 thrpt 5 7.705 ± 0.172 ops/us BitsetToArrayBenchmark.hybrid 5 thrpt 5 25.588 ± 0.491 ops/us BitsetToArrayBenchmark.hybrid 10 thrpt 5 21.151 ± 0.403 ops/us BitsetToArrayBenchmark.hybrid 20 thrpt 5 15.653 ± 0.263 ops/us BitsetToArrayBenchmark.hybrid 30 thrpt 5 12.431 ± 0.027 ops/us BitsetToArrayBenchmark.hybrid 40 thrpt 5 15.414 ± 0.032 ops/us BitsetToArrayBenchmark.hybrid 50 thrpt 5 15.415 ± 0.065 ops/us BitsetToArrayBenchmark.hybrid 60 thrpt 5 15.188 ± 0.806 ops/us BitsetToArrayBenchmark.whileLoop 5 thrpt 5 29.224 ± 0.503 ops/us BitsetToArrayBenchmark.whileLoop 10 thrpt 5 23.237 ± 0.697 ops/us BitsetToArrayBenchmark.whileLoop 20 thrpt 5 16.777 ± 0.278 ops/us BitsetToArrayBenchmark.whileLoop 30 thrpt 5 13.019 ± 0.213 ops/us BitsetToArrayBenchmark.whileLoop 40 thrpt 5 10.700 ± 0.095 ops/us BitsetToArrayBenchmark.whileLoop 50 thrpt 5 9.047 ± 0.015 ops/us BitsetToArrayBenchmark.whileLoop 60 thrpt 5 7.786 ± 0.224 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org