gf2121 commented on PR #14910:
URL: https://github.com/apache/lucene/pull/14910#issuecomment-3066683054

   JMH results with the vectorized implementations:
   
   ```
   Benchmark                                                (bitCount)   Mode  
Cnt   Score   Error   Units
   BitsetToArrayBenchmark.dense                                      5  thrpt   
 5   9.583 ± 0.238  ops/us
   BitsetToArrayBenchmark.dense                                     10  thrpt   
 5   6.926 ± 0.151  ops/us
   BitsetToArrayBenchmark.dense                                     20  thrpt   
 5   4.597 ± 0.042  ops/us
   BitsetToArrayBenchmark.dense                                     30  thrpt   
 5   3.420 ± 0.033  ops/us
   BitsetToArrayBenchmark.dense                                     40  thrpt   
 5   3.766 ± 0.013  ops/us
   BitsetToArrayBenchmark.dense                                     50  thrpt   
 5   5.299 ± 0.126  ops/us
   BitsetToArrayBenchmark.dense                                     60  thrpt   
 5   8.991 ± 0.223  ops/us
   BitsetToArrayBenchmark.denseBranchLess                            5  thrpt   
 5  13.520 ± 0.132  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           10  thrpt   
 5  13.440 ± 0.575  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           20  thrpt   
 5  13.521 ± 0.289  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           30  thrpt   
 5  13.488 ± 0.641  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           40  thrpt   
 5  13.501 ± 0.375  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           50  thrpt   
 5  13.555 ± 0.384  ops/us
   BitsetToArrayBenchmark.denseBranchLess                           60  thrpt   
 5  13.524 ± 0.498  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                        5  thrpt   
 5   8.521 ± 0.120  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       10  thrpt   
 5   6.315 ± 0.164  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       20  thrpt   
 5  11.531 ± 0.176  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       30  thrpt   
 5  11.493 ± 0.255  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       40  thrpt   
 5  11.535 ± 0.018  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       50  thrpt   
 5  11.539 ± 0.084  ops/us
   BitsetToArrayBenchmark.denseBranchLessCmov                       60  thrpt   
 5   9.100 ± 0.017  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                    5  thrpt   
 5  15.428 ± 0.155  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   10  thrpt   
 5  15.424 ± 0.282  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   20  thrpt   
 5  15.375 ± 0.341  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   30  thrpt   
 5  15.395 ± 0.121  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   40  thrpt   
 5  15.308 ± 0.407  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   50  thrpt   
 5  15.322 ± 0.174  ops/us
   BitsetToArrayBenchmark.denseBranchLessParallel                   60  thrpt   
 5  15.439 ± 0.064  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                   5  thrpt   
 5  15.795 ± 0.380  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  10  thrpt   
 5  15.827 ± 0.228  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  20  thrpt   
 5  15.672 ± 0.991  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  30  thrpt   
 5  15.789 ± 0.327  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  40  thrpt   
 5  15.764 ± 0.350  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  50  thrpt   
 5  15.725 ± 0.393  ops/us
   BitsetToArrayBenchmark.denseBranchLessUnrolling                  60  thrpt   
 5  15.868 ± 0.028  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                  5  thrpt   
 5  25.889 ± 0.471  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 10  thrpt   
 5  25.975 ± 0.129  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 20  thrpt   
 5  25.852 ± 0.299  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 30  thrpt   
 5  25.888 ± 0.371  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 40  thrpt   
 5  25.708 ± 1.028  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 50  thrpt   
 5  25.856 ± 0.612  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized                 60  thrpt   
 5  25.931 ± 0.144  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512               5  thrpt   
 5  28.221 ± 0.545  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              10  thrpt   
 5  28.306 ± 0.209  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              20  thrpt   
 5  26.827 ± 1.704  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              30  thrpt   
 5  27.027 ± 0.214  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              40  thrpt   
 5  26.504 ± 0.909  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              50  thrpt   
 5  25.725 ± 0.084  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512              60  thrpt   
 5  25.495 ± 1.521  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2           5  thrpt   
 5   1.137 ± 0.473  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          10  thrpt   
 5   0.856 ± 0.312  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          20  thrpt   
 5   0.171 ± 0.091  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          30  thrpt   
 5   0.159 ± 0.072  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          40  thrpt   
 5   0.097 ± 0.042  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          50  thrpt   
 5   0.069 ± 0.021  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorized512AVX2          60  thrpt   
 5   0.068 ± 0.041  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2              5  thrpt   
 5  20.310 ± 0.139  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             10  thrpt   
 5  20.125 ± 0.352  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             20  thrpt   
 5  19.961 ± 0.653  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             30  thrpt   
 5  20.025 ± 1.040  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             40  thrpt   
 5  20.051 ± 0.556  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             50  thrpt   
 5  20.128 ± 0.131  ops/us
   BitsetToArrayBenchmark.denseBranchLessVectorizedAVX2             60  thrpt   
 5  19.769 ± 2.266  ops/us
   BitsetToArrayBenchmark.denseInvert                                5  thrpt   
 5  19.958 ± 0.355  ops/us
   BitsetToArrayBenchmark.denseInvert                               10  thrpt   
 5  13.497 ± 0.826  ops/us
   BitsetToArrayBenchmark.denseInvert                               20  thrpt   
 5   6.995 ± 0.093  ops/us
   BitsetToArrayBenchmark.denseInvert                               30  thrpt   
 5   4.579 ± 0.035  ops/us
   BitsetToArrayBenchmark.denseInvert                               40  thrpt   
 5   4.447 ± 0.028  ops/us
   BitsetToArrayBenchmark.denseInvert                               50  thrpt   
 5   4.082 ± 0.051  ops/us
   BitsetToArrayBenchmark.denseInvert                               60  thrpt   
 5   6.732 ± 0.145  ops/us
   BitsetToArrayBenchmark.forLoop                                    5  thrpt   
 5  26.332 ± 0.080  ops/us
   BitsetToArrayBenchmark.forLoop                                   10  thrpt   
 5  21.765 ± 0.029  ops/us
   BitsetToArrayBenchmark.forLoop                                   20  thrpt   
 5  15.878 ± 0.247  ops/us
   BitsetToArrayBenchmark.forLoop                                   30  thrpt   
 5  12.606 ± 0.251  ops/us
   BitsetToArrayBenchmark.forLoop                                   40  thrpt   
 5  10.440 ± 0.036  ops/us
   BitsetToArrayBenchmark.forLoop                                   50  thrpt   
 5   8.875 ± 0.164  ops/us
   BitsetToArrayBenchmark.forLoop                                   60  thrpt   
 5   7.735 ± 0.171  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                     5  thrpt   
 5  26.018 ± 0.586  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    10  thrpt   
 5  21.031 ± 0.364  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    20  thrpt   
 5  15.683 ± 0.266  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    30  thrpt   
 5  12.502 ± 0.056  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    40  thrpt   
 5  10.330 ± 0.212  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    50  thrpt   
 5   8.842 ± 0.020  ops/us
   BitsetToArrayBenchmark.forLoopManualUnrolling                    60  thrpt   
 5   7.705 ± 0.172  ops/us
   BitsetToArrayBenchmark.hybrid                                     5  thrpt   
 5  25.588 ± 0.491  ops/us
   BitsetToArrayBenchmark.hybrid                                    10  thrpt   
 5  21.151 ± 0.403  ops/us
   BitsetToArrayBenchmark.hybrid                                    20  thrpt   
 5  15.653 ± 0.263  ops/us
   BitsetToArrayBenchmark.hybrid                                    30  thrpt   
 5  12.431 ± 0.027  ops/us
   BitsetToArrayBenchmark.hybrid                                    40  thrpt   
 5  15.414 ± 0.032  ops/us
   BitsetToArrayBenchmark.hybrid                                    50  thrpt   
 5  15.415 ± 0.065  ops/us
   BitsetToArrayBenchmark.hybrid                                    60  thrpt   
 5  15.188 ± 0.806  ops/us
   BitsetToArrayBenchmark.whileLoop                                  5  thrpt   
 5  29.224 ± 0.503  ops/us
   BitsetToArrayBenchmark.whileLoop                                 10  thrpt   
 5  23.237 ± 0.697  ops/us
   BitsetToArrayBenchmark.whileLoop                                 20  thrpt   
 5  16.777 ± 0.278  ops/us
   BitsetToArrayBenchmark.whileLoop                                 30  thrpt   
 5  13.019 ± 0.213  ops/us
   BitsetToArrayBenchmark.whileLoop                                 40  thrpt   
 5  10.700 ± 0.095  ops/us
   BitsetToArrayBenchmark.whileLoop                                 50  thrpt   
 5   9.047 ± 0.015  ops/us
   BitsetToArrayBenchmark.whileLoop                                 60  thrpt   
 5   7.786 ± 0.224  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to