rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793362865
I tweaked the FMA logic for AMD cpus, to only avoid the high-latency scalar FMA where necessary. Should appease germans to get that extra ulp or whatever. sysprops default to "auto" so you can override however you want, without fear of involving BigDecimal :) I can test the intel and arm families in the same way and try to tighten it up tomorrow. AMD Zen4: EPYC 9R14 (family 0x19) ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.842 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 13.497 ± 0.171 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.540 ± 0.002 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.441 ± 0.424 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.540 ± 0.008 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 16.655 ± 0.575 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.763 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 13.477 ± 0.168 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.583 ± 0.003 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.438 ± 0.493 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.560 ± 0.009 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 15.778 ± 0.114 ops/us ``` AMD Zen3: EPYC 7R13 (family 0x19) ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.982 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 10.476 ± 0.026 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.246 ± 0.015 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.959 ± 0.480 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.298 ± 0.010 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 16.342 ± 0.508 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.344 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 10.445 ± 0.048 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.405 ± 0.006 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.486 ± 0.374 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.995 ± 0.002 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 16.374 ± 0.462 ops/us ``` AMD Zen2: EPYC 7R32 (family 0x17) ``` Main: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.922 ± 0.005 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.519 ± 0.020 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.968 ± 0.020 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 15.950 ± 0.486 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.015 ± 0.012 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 15.894 ± 0.331 ops/us Patch: Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.200 ± 0.005 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.520 ± 0.018 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.114 ± 0.021 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 15.671 ± 0.439 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.490 ± 0.030 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 15.189 ± 0.170 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org