rmuir commented on PR #12787:
URL: https://github.com/apache/lucene/pull/12787#issuecomment-1803343093

   When I run `make PATCH_BRANCH=rmuir:microbenchmark_ec2` we will just see no 
differences but it demonstrates it (sorry: no speedups in this branch!).
   
   It spins up/tears down `lucene-jmh` cloudformation stack with all the 
instances, this way things are organized in your account. If something goes 
wrong with the script just delete the entire stack from AWS console yourself if 
you want. For any automation/concurrent runs change the stack name to e.g. env 
var of job ID, it also gives good separation for that without the overhead and 
limits of separate VPCs.
   
   ![Screen_Shot_2023-11-09_at_02 34 
36](https://github.com/apache/lucene/assets/504194/67185b22-2a99-4737-ad6e-bb69ad2108c2)
   
   There's a few minutes of overhead spinning up and configuring the machines, 
but at least it is all in parallel, mostly dominated by `./gradlew assemble` 
which takes the longest.
   
   Actually running all the JMH benchmarks takes 30 minutes, it is what it is. 
It is not the overhead of this script. If you want to run a specific one, just 
use `JMH_ARGS` (see the README) and you'll get results much faster. 
   
   Here is the breakdown of time spent:
   
   
   |Task|Time|
   |-------|-------|
   |Run benchmark|1502.06s|
   |Assemble Sources|285.18s|
   |Reboot machine|33.89s|
   |Create cloudformation stack|26.70s|
   |Install packages|21.36s|
   |Download JDK|20.57s|
   |Checkout main|10.46s|
   |Checkout patch|9.39s|
   |Wait for connection|7.36s|
   |Gather facts| 2.12s|
   |Write Report |2.02s|
   |Configure kernel |1.17s|
   |Lookup default VPC |0.98s|
   |Gather instance details |0.96s|
   |Read main results | 0.67s|
   |Configure JDK |0.59s|
   |Configure Gradle |0.46s|
   |Read patch results | 0.35s|
   |Add instances to inventory |0.24s|
   |Create combined report |0.21s|
   
   
   full benchmark run costs less than $1USD, I am pretty sure.
   
   Output format is "big ass PR comment" format. I know its not the best, but i 
don't feel like parsing json.
   
   Output run against this branch:
   
   cascadelake: `['0', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8275CL CPU @ 
3.00GHz', '1', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8275CL CPU @ 
3.00GHz']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.932 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   6.390 ± 
0.014  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.506 ± 
0.005  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  13.929 ± 
0.006  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.920 ± 
0.030  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  11.088 ± 
0.124  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.639 ± 
0.007  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   8.896 ± 
0.070  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.388 ± 
0.046  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  13.770 ± 
0.079  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.625 ± 
0.011  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  12.385 ± 
0.131  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.932 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   6.396 ± 
0.008  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.505 ± 
0.005  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  13.759 ± 
0.246  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.928 ± 
0.005  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  11.137 ± 
0.126  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.638 ± 
0.007  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   9.473 ± 
0.203  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.385 ± 
0.046  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  13.900 ± 
0.114  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.629 ± 
0.002  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  12.676 ± 
0.262  ops/us
   ```
   
   graviton2: `['0', '1']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt  Score    
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15  0.808 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15  1.254 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15  2.386 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  2.309 ±  
0.016  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15  1.909 ±  
0.002  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  1.861 ±  
0.002  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15  1.569 ±  
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  5.376 ±  
0.054  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15  2.072 ±  
0.070  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  6.409 ±  
0.172  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15  1.752 ±  
0.001  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  6.141 ±  
0.038  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt  Score    
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15  0.808 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15  1.254 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15  2.386 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  2.327 ±  
0.002  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15  1.911 ±  
0.004  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  1.862 ±  
0.001  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15  1.570 ±  
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  5.388 ±  
0.049  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15  2.097 ±  
0.036  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  6.332 ±  
0.113  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15  1.752 ±  
0.001  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  6.135 ±  
0.042  ops/us
   ```
   
   graviton3: `['0', '1']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt  Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15  0.842 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15  4.421 ± 
0.009  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15  2.370 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  6.954 ± 
0.011  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15  2.466 ± 
0.026  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  5.848 ± 
0.037  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15  1.422 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  6.272 ± 
0.025  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15  3.739 ± 
0.057  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  9.828 ± 
0.224  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15  3.182 ± 
0.045  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  9.125 ± 
0.045  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt  Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15  0.842 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15  4.423 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15  2.370 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  6.960 ± 
0.007  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15  2.471 ± 
0.020  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  5.849 ± 
0.011  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15  1.421 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  6.264 ± 
0.029  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15  3.755 ± 
0.003  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  9.877 ± 
0.250  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15  3.207 ± 
0.012  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  9.113 ± 
0.044  ops/us
   ```
   
   haswell: `['0', 'GenuineIntel', 'Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz', 
'1', 'GenuineIntel', 'Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.728 ± 
0.008  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   3.586 ± 
0.008  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.011 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   7.915 ± 
0.011  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.539 ± 
0.003  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   6.939 ± 
0.009  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.308 ± 
0.005  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   7.453 ± 
0.067  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   2.154 ± 
0.046  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  12.245 ± 
0.117  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.351 ± 
0.061  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  11.299 ± 
0.219  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.730 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   3.586 ± 
0.007  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.012 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   7.919 ± 
0.009  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.539 ± 
0.003  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   6.941 ± 
0.005  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.305 ± 
0.011  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   7.488 ± 
0.068  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   2.153 ± 
0.047  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  12.292 ± 
0.101  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.368 ± 
0.001  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  11.387 ± 
0.202  ops/us
   ```
   
   icelake: `['0', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8375C CPU @ 
2.90GHz', '1', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.842 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   7.346 ± 
0.008  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.585 ± 
0.007  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  16.481 ± 
0.030  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.807 ± 
0.010  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  14.150 ± 
0.077  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.524 ± 
0.004  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   9.790 ± 
0.354  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.305 ± 
0.021  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  13.135 ± 
0.152  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.243 ± 
0.005  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  11.529 ± 
0.291  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.843 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   7.351 ± 
0.014  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.586 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  16.410 ± 
0.071  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.809 ± 
0.008  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  14.090 ± 
0.067  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.525 ± 
0.005  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   9.872 ± 
0.346  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.304 ± 
0.020  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  13.162 ± 
0.151  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.239 ± 
0.005  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  11.574 ± 
0.345  ops/us
   ```
   
   sapphirerapids: `['0', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8488C', 
'1', 'GenuineIntel', 'Intel(R) Xeon(R) Platinum 8488C']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   1.134 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   8.877 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.826 ± 
0.012  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  18.573 ± 
0.009  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   2.701 ± 
0.050  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  16.520 ± 
0.294  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.702 ± 
0.004  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  14.571 ± 
0.251  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.715 ± 
0.018  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  22.118 ± 
0.619  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.933 ± 
0.011  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  21.969 ± 
0.103  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   1.121 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   8.738 ± 
0.042  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   2.797 ± 
0.025  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  18.222 ± 
0.009  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   2.663 ± 
0.003  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  16.051 ± 
0.295  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.680 ± 
0.014  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  14.678 ± 
0.255  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.473 ± 
0.297  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  18.335 ± 
0.435  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.518 ± 
0.001  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  16.154 ± 
0.268  ops/us
   ```
   
   zen2: `['0', 'AuthenticAMD', 'AMD EPYC 7R32', '1', 'AuthenticAMD', 'AMD EPYC 
7R32']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.497 ± 
0.003  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   3.771 ± 
0.012  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.543 ± 
0.009  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   9.977 ± 
0.004  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.276 ± 
0.002  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   9.034 ± 
0.024  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.184 ± 
0.007  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   8.380 ± 
0.025  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.089 ± 
0.022  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  15.269 ± 
0.401  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.449 ± 
0.031  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  15.326 ± 
0.297  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.493 ± 
0.003  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   3.769 ± 
0.022  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.548 ± 
0.008  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   9.918 ± 
0.038  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.273 ± 
0.004  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   9.008 ± 
0.026  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.187 ± 
0.017  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75   8.411 ± 
0.025  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.086 ± 
0.037  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  15.775 ± 
0.494  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   2.482 ± 
0.009  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  14.919 ± 
0.081  ops/us
   ```
   
   zen3: `['0', 'AuthenticAMD', 'AMD EPYC 7R13 Processor', '1', 'AuthenticAMD', 
'AMD EPYC 7R13 Processor']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.785 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   5.453 ± 
0.029  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.579 ± 
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   9.803 ± 
0.010  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.269 ± 
0.006  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   9.349 ± 
0.012  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.346 ± 
0.008  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  10.494 ± 
0.060  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.395 ± 
0.019  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  16.544 ± 
0.326  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.004 ± 
0.002  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  16.070 ± 
0.233  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.779 ± 
0.010  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   5.463 ± 
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.578 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15   9.790 ± 
0.037  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.271 ± 
0.002  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15   9.363 ± 
0.010  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.347 ± 
0.006  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  10.492 ± 
0.033  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.400 ± 
0.015  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  16.568 ± 
0.405  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.007 ± 
0.001  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  16.719 ± 
0.441  ops/us
   ```
   
   zen4: `['0', 'AuthenticAMD', 'AMD EPYC 9R14', '1', 'AuthenticAMD', 'AMD EPYC 
9R14']`
   
   main
   ```
   Benchmark                                   (size)   Mode  Cnt   Score   
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.668 ± 
0.003  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   8.784 ± 
0.094  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.856 ± 
0.002  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  22.390 ± 
0.071  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.542 ± 
0.001  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  18.104 ± 
0.055  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.763 ± 
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  13.427 ± 
0.146  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.579 ± 
0.014  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  16.396 ± 
0.477  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.561 ± 
0.004  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  16.581 ± 
0.494  ops/us
   ```
   patch
   ```
   Benchmark                                   (size)   Mode  Cnt   Score    
Error   Units
   VectorUtilBenchmark.binaryCosineScalar        1024  thrpt   15   0.669 ±  
0.004  ops/us
   VectorUtilBenchmark.binaryCosineVector        1024  thrpt   15   8.773 ±  
0.092  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15   1.855 ±  
0.003  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15  22.408 ±  
0.044  ops/us
   VectorUtilBenchmark.binarySquareScalar        1024  thrpt   15   1.540 ±  
0.001  ops/us
   VectorUtilBenchmark.binarySquareVector        1024  thrpt   15  18.165 ±  
0.130  ops/us
   VectorUtilBenchmark.floatCosineScalar         1024  thrpt   15   1.763 ±  
0.001  ops/us
   VectorUtilBenchmark.floatCosineVector         1024  thrpt   75  13.461 ±  
0.147  ops/us
   VectorUtilBenchmark.floatDotProductScalar     1024  thrpt   15   3.580 ±  
0.014  ops/us
   VectorUtilBenchmark.floatDotProductVector     1024  thrpt   75  15.989 ±  
0.319  ops/us
   VectorUtilBenchmark.floatSquareScalar         1024  thrpt   15   3.562 ±  
0.002  ops/us
   VectorUtilBenchmark.floatSquareVector         1024  thrpt   75  16.091 ±  
0.477  ops/us
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to