Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

via GitHub Sat, 04 Nov 2023 07:24:43 -0700


rmuir commented on PR #12737:
URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793456342


   Benchmarks for the intel cpus. There is one place i'd fix, if we could 
detect sapphire rapids and avoid scalar FMA. But i have no way to detect it 
based on what new features it has / what openjdk exposes at the moment. 
Otherwise performance is good.
   
   Sapphire Rapids:
   ```
   Main:
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   0.871 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  13.907 ± 
0.266  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   4.275 ± 
0.023  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  22.218 ± 
0.759  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.819 ± 
0.004  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  20.243 ± 
0.352  ops/us
   
   Patch:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   1.650 ± 
0.002  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75  13.799 ± 
0.233  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   3.612 ± 
0.012  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  23.300 ± 
1.079  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   2.884 ± 
0.004  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75  20.449 ± 
0.446  ops/us
   ```
   
   Ice Lake:
   ```
   Main:
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   0.547 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   9.842 ± 
0.334  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   2.471 ± 
0.002  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  13.452 ± 
0.455  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   1.749 ± 
0.004  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  11.813 ± 
0.456  ops/us
   
   Patch:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   1.528 ± 
0.003  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75   9.919 ± 
0.345  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   3.314 ± 
0.003  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  13.137 ± 
0.155  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   3.248 ± 
0.025  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75  11.920 ± 
0.469  ops/us
   ```
   
   Cascade Lake:
   ```
   Main:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   0.578 ± 
0.005  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75   8.907 ± 
0.095  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   1.742 ± 
0.003  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  13.935 ± 
0.129  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   1.347 ± 
0.005  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75  12.526 ± 
0.132  ops/us
   
   Patch:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   1.641 ± 
0.002  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75   8.823 ± 
0.114  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   3.401 ± 
0.014  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  13.874 ± 
0.116  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   2.629 ± 
0.016  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75  12.462 ± 
0.123  ops/us
   ```
   
   Haswell:
   ```
   Main:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   0.728 ± 
0.005  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75   6.781 ± 
0.071  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   1.730 ± 
0.034  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  10.603 ± 
0.351  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   1.398 ± 
0.060  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75   9.470 ± 
0.286  ops/us
   
   Patch:
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.floatCosineScalar        1024  thrpt   15   1.199 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector        1024  thrpt   75   6.775 ± 
0.083  ops/us
   VectorUtilBenchmark.floatDotProductScalar    1024  thrpt   15   2.465 ± 
0.017  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  10.410 ± 
0.300  ops/us
   VectorUtilBenchmark.floatSquareScalar        1024  thrpt   15   2.299 ± 
0.005  ops/us
   VectorUtilBenchmark.floatSquareVector        1024  thrpt   75   9.117 ± 
0.118  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

Reply via email to