rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793456342
Benchmarks for the intel cpus. There is one place i'd fix, if we could detect sapphire rapids and avoid scalar FMA. But i have no way to detect it based on what new features it has / what openjdk exposes at the moment. Otherwise performance is good. Sapphire Rapids: ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.871 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 13.907 ± 0.266 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 4.275 ± 0.023 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 22.218 ± 0.759 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.819 ± 0.004 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 20.243 ± 0.352 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.650 ± 0.002 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 13.799 ± 0.233 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.612 ± 0.012 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 23.300 ± 1.079 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.884 ± 0.004 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 20.449 ± 0.446 ops/us ``` Ice Lake: ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.547 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 9.842 ± 0.334 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.471 ± 0.002 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 13.452 ± 0.455 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 1.749 ± 0.004 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.813 ± 0.456 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.528 ± 0.003 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 9.919 ± 0.345 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.314 ± 0.003 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 13.137 ± 0.155 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.248 ± 0.025 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.920 ± 0.469 ops/us ``` Cascade Lake: ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.578 ± 0.005 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.907 ± 0.095 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 1.742 ± 0.003 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 13.935 ± 0.129 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 1.347 ± 0.005 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 12.526 ± 0.132 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.641 ± 0.002 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.823 ± 0.114 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.401 ± 0.014 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 13.874 ± 0.116 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.629 ± 0.016 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 12.462 ± 0.123 ops/us ``` Haswell: ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.728 ± 0.005 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 6.781 ± 0.071 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 1.730 ± 0.034 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 10.603 ± 0.351 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 1.398 ± 0.060 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 9.470 ± 0.286 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.199 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 6.775 ± 0.083 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.465 ± 0.017 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 10.410 ± 0.300 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.299 ± 0.005 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 9.117 ± 0.118 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org