rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793488056
Here are the ARMs. I had to tweak ARM to use FMA more aggressively to fully utilize the gravitons. The problem there is just apple silicon, it is good we did not move forwards with benchmarks based solely on some macs. You may not like my detector, but I think it is quite practical and prevents slow execution. Graviton 3 ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.682 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 5.500 ± 0.004 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.411 ± 0.037 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 11.522 ± 0.234 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.169 ± 0.005 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 8.632 ± 0.084 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.422 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 6.911 ± 0.039 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.751 ± 0.007 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 11.498 ± 0.418 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.202 ± 0.007 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 10.795 ± 0.154 ops/us ``` Graviton 2 ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.647 ± 0.002 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 2.599 ± 0.002 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 1.430 ± 0.007 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 6.192 ± 0.098 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 1.194 ± 0.003 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 4.797 ± 0.088 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.571 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 5.408 ± 0.013 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.055 ± 0.066 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 6.673 ± 0.260 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 1.753 ± 0.001 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 6.179 ± 0.070 ops/us ``` Mac M1 ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.077 ± 0.002 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 7.651 ± 0.032 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.606 ± 0.032 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.296 ± 0.268 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.197 ± 0.001 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 14.185 ± 0.099 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 2.062 ± 0.006 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 7.644 ± 0.030 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 4.273 ± 0.003 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.110 ± 0.283 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.770 ± 0.007 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 14.184 ± 0.100 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org