rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793231388
vector results for this AMD CPU are unchanged by this PR. Float-relevant performance info from avxturbo. This CPU doesn't downclock but 512-bit FMA is 2x as slow as 256-bit FMA, so i did some experiments... ``` Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio 1 | avx128_fma_t | 128-bit parallel DP FMAs | 1.000 | 7402 | 1.42 | 3700 | 1.00 1 | avx256_fma_t | 256-bit parallel DP FMAs | 1.000 | 7402 | 1.42 | 3700 | 1.00 1 | avx512_fma_t | 512-bit parallel DP FMAs | 1.000 | 3700 | 1.42 | 3700 | 1.00 ``` Float: INFO: Java vector incubator API enabled; uses preferredBitSize=512; FMA enabled ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 13.397 ± 0.205 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 16.226 ± 0.434 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 16.147 ± 0.394 ops/us ``` Float (avoiding AVX-512 entirely by passing -XX:MaxVectorSize=32) INFO: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 11.234 ± 0.041 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 17.045 ± 0.436 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 16.876 ± 0.351 ops/us ``` Binary-relevant performance info from avxturbo: ``` Cores | ID | Description | OVRLP3 | Mops | A/M-ratio | A/M-MHz | M/tsc-ratio 1 | avx128_imul | 128-bit integer muls (vpmuldq) | 1.000 | 1233 | 1.42 | 3700 | 1.00 1 | avx256_imul | 256-bit integer muls (vpmuldq) | 1.000 | 1233 | 1.42 | 3700 | 1.00 1 | avx512_imul | 512-bit integer muls (vpmuldq) | 1.000 | 1233 | 1.42 | 3700 | 1.00 ``` Binary: INFO: Java vector incubator API enabled; uses preferredBitSize=512; FMA enabled ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryCosineVector 1024 thrpt 15 8.769 ± 0.083 ops/us VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 22.362 ± 0.054 ops/us VectorUtilBenchmark.binarySquareVector 1024 thrpt 15 18.080 ± 0.171 ops/us ``` Binary (512-bit vectors but disabling Intel-specific downclock-protection / doing 32-bit vpmul) INFO: Java vector incubator API enabled; uses preferredBitSize=512; FMA enabled ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryCosineVector 1024 thrpt 15 10.669 ± 0.242 ops/us VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 21.148 ± 0.087 ops/us VectorUtilBenchmark.binarySquareVector 1024 thrpt 15 18.048 ± 0.142 ops/us ``` Binary (avoiding AVX-512 entirely by passing -XX:MaxVectorSize=32) INFO: Java vector incubator API enabled; uses preferredBitSize=256; FMA enabled ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryCosineVector 1024 thrpt 15 8.773 ± 0.006 ops/us VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 17.484 ± 0.022 ops/us VectorUtilBenchmark.binarySquareVector 1024 thrpt 15 14.930 ± 0.018 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org