uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787740992
Hi, on my older Ryzen it is faster with FMA enabled (I downgraded your branch and also verified that the system prints "FMA enabled". Here is my full benchmark: ``` main, AMD Ryzen 7 3700X 8-Core Processor INFO: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 1.155 ± 0.012 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 10.602 ± 0.213 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 3.675 ± 0.010 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 18.656 ± 0.109 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 2.598 ± 0.023 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 20.843 ± 0.205 ops/us Robert latest, no FMA, AMD Ryzen 7 3700X 8-Core Processor INFO: Java vector incubator API enabled; uses preferredBitSize=256 Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 1.487 ± 0.015 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 11.810 ± 0.336 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 3.910 ± 0.126 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 18.885 ± 0.238 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 3.067 ± 0.049 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 16.874 ± 0.325 ops/us Robert 93fed5fe22bec39a7a3683f50b4632756f4b1c13, enforced FMA, AMD Ryzen 7 3700X 8-Core Processor INFO: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 1.562 ± 0.016 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 10.669 ± 0.188 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 3.206 ± 0.018 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 18.474 ± 0.071 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 3.125 ± 0.119 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 20.247 ± 0.415 ops/us ``` For completeness here my Intel-Laptop: ``` main, Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 1992 MHz INFORMATION: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 0,712 ± 0,354 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 9,134 ± 0,204 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 2,383 ± 0,125 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 14,116 ± 1,304 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 1,663 ± 0,062 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 14,735 ± 0,258 ops/us Robert, Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 1992 MHz INFORMATION: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 1,840 ± 0,021 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 9,253 ± 0,296 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 3,160 ± 0,449 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 13,943 ± 1,281 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 2,140 ± 1,072 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 12,089 ± 4,551 ops/us ``` So in my opinion, FMA is fine (as it is more precise). Just because one of the CPUs slows, we cannot say "all AMD are bad". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org