shubhamvishu commented on PR #15508:
URL: https://github.com/apache/lucene/pull/15508#issuecomment-3663515114

   Sharing some JMH benchmark results from Graviton 2 and Graviton 3 with this 
PR
   
   `dot8sNative` : Native dot product implementation (with SVE/NEON/Scalar)
   `dot8sNativeSimple` : Simple scalar for loop approach (letting GCC auto 
vectorize for the native arch)
   
   **Graviton 2**
   
   ```
   Benchmark                                   (size)   Mode  Cnt    Score   
Error   Units
   VectorUtilBenchmark.binaryDotProductVector       1  thrpt   15  208.089 ± 
0.017  ops/us
   VectorUtilBenchmark.binaryDotProductVector     128  thrpt   15   15.288 ± 
0.071  ops/us
   VectorUtilBenchmark.binaryDotProductVector     207  thrpt   15    9.948 ± 
0.063  ops/us
   VectorUtilBenchmark.binaryDotProductVector     256  thrpt   15    8.326 ± 
0.030  ops/us
   VectorUtilBenchmark.binaryDotProductVector     300  thrpt   15    7.063 ± 
0.050  ops/us
   VectorUtilBenchmark.binaryDotProductVector     512  thrpt   15    4.311 ± 
0.023  ops/us
   VectorUtilBenchmark.binaryDotProductVector     702  thrpt   15    3.198 ± 
0.026  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15    2.220 ± 
0.020  ops/us
   VectorUtilBenchmark.dot8sNative                  1  thrpt   15   86.070 ± 
0.014  ops/us
   VectorUtilBenchmark.dot8sNative                128  thrpt   15   84.463 ± 
3.198  ops/us
   VectorUtilBenchmark.dot8sNative                207  thrpt   15   49.372 ± 
1.150  ops/us
   VectorUtilBenchmark.dot8sNative                256  thrpt   15   70.491 ± 
0.226  ops/us
   VectorUtilBenchmark.dot8sNative                300  thrpt   15   44.338 ± 
0.611  ops/us
   VectorUtilBenchmark.dot8sNative                512  thrpt   15   43.895 ± 
4.055  ops/us
   VectorUtilBenchmark.dot8sNative                702  thrpt   15   27.977 ± 
1.614  ops/us
   VectorUtilBenchmark.dot8sNative               1024  thrpt   15   27.598 ± 
0.102  ops/us
   VectorUtilBenchmark.dot8sNativeSimple            1  thrpt   15  103.949 ± 
0.216  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          128  thrpt   15   89.996 ± 
3.834  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          207  thrpt   15   52.342 ± 
0.909  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          256  thrpt   15   66.461 ± 
3.691  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          300  thrpt   15   49.841 ± 
1.630  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          512  thrpt   15   47.575 ± 
0.230  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          702  thrpt   15   30.196 ± 
2.045  ops/us
   VectorUtilBenchmark.dot8sNativeSimple         1024  thrpt   15   26.870 ± 
2.711  ops/us
   ```
   
   **Graviton 3**
   
   ```
   Benchmark                                   (size)   Mode  Cnt    Score    
Error   Units
   VectorUtilBenchmark.binaryDotProductVector       1  thrpt   15  449.506 ±  
4.604  ops/us
   VectorUtilBenchmark.binaryDotProductVector     128  thrpt   15   60.230 ±  
0.007  ops/us
   VectorUtilBenchmark.binaryDotProductVector     207  thrpt   15   36.598 ±  
0.085  ops/us
   VectorUtilBenchmark.binaryDotProductVector     256  thrpt   15   31.289 ±  
0.006  ops/us
   VectorUtilBenchmark.binaryDotProductVector     300  thrpt   15   26.704 ±  
0.044  ops/us
   VectorUtilBenchmark.binaryDotProductVector     512  thrpt   15   15.934 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector     702  thrpt   15   11.607 ±  
0.007  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15    8.041 ±  
0.001  ops/us
   VectorUtilBenchmark.dot8sNative                  1  thrpt   15  191.014 ±  
2.466  ops/us
   VectorUtilBenchmark.dot8sNative                128  thrpt   15  134.566 ±  
2.626  ops/us
   VectorUtilBenchmark.dot8sNative                207  thrpt   15  105.161 ±  
6.314  ops/us
   VectorUtilBenchmark.dot8sNative                256  thrpt   15   93.163 ±  
3.352  ops/us
   VectorUtilBenchmark.dot8sNative                300  thrpt   15   90.764 ±  
7.961  ops/us
   VectorUtilBenchmark.dot8sNative                512  thrpt   15   67.553 ±  
1.328  ops/us
   VectorUtilBenchmark.dot8sNative                702  thrpt   15   51.275 ±  
3.981  ops/us
   VectorUtilBenchmark.dot8sNative               1024  thrpt   15   40.886 ±  
2.880  ops/us
   VectorUtilBenchmark.dot8sNativeSimple            1  thrpt   15  221.399 ±  
4.357  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          128  thrpt   15  162.158 ±  
5.077  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          207  thrpt   15  119.323 ± 
12.108  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          256  thrpt   15  111.288 ±  
3.256  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          300  thrpt   15   90.587 ± 
11.066  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          512  thrpt   15   58.725 ±  
3.419  ops/us
   VectorUtilBenchmark.dot8sNativeSimple          702  thrpt   15   47.595 ±  
3.692  ops/us
   VectorUtilBenchmark.dot8sNativeSimple         1024  thrpt   15   36.442 ±  
0.377  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to