uschindler commented on PR #12667: URL: https://github.com/apache/lucene/pull/12667#issuecomment-1761940947
I ran all benchmarks in module mode (second line of assemble output) on my AVX-256 laptop: Prozessor: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 1992 MHz, 4 Kern(e), 8 logische(r) Prozessor(en) ``` C:\Users\Uwe Schindler\Projects\lucene\lucene>"C:\Program Files\Java\jdk-21\bin\java" --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.lucene.benchmark.jmh [...] # Benchmark: org.apache.lucene.benchmark.jmh.VectorUtilBenchmark.floatSquareScalar # Parameters: (size = 1024) # Run progress: 90,63% complete, ETA 00:03:42 # Fork: 1 of 1 # Warmup Iteration 1: Okt. 13, 2023 6:31:40 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup WARNUNG: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API. [...] # Benchmark: org.apache.lucene.benchmark.jmh.VectorUtilBenchmark.floatSquareVector # Parameters: (size = 1024) # Run progress: 98,96% complete, ETA 00:00:24 # Fork: 1 of 1 WARNING: Using incubator modules: jdk.incubator.vector # Warmup Iteration 1: Okt. 13, 2023 6:34:58 PM org.apache.lucene.internal.vectorization.PanamaVectorizationProvider <init> INFORMATION: Java vector incubator API enabled; uses preferredBitSize=256 [...] Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryCosineScalar 1 thrpt 5 111,963 ± 55,067 ops/us VectorUtilBenchmark.binaryCosineScalar 128 thrpt 5 6,607 ± 0,530 ops/us VectorUtilBenchmark.binaryCosineScalar 207 thrpt 5 4,297 ± 0,268 ops/us VectorUtilBenchmark.binaryCosineScalar 256 thrpt 5 3,591 ± 0,098 ops/us VectorUtilBenchmark.binaryCosineScalar 300 thrpt 5 2,831 ± 0,761 ops/us VectorUtilBenchmark.binaryCosineScalar 512 thrpt 5 1,749 ± 0,261 ops/us VectorUtilBenchmark.binaryCosineScalar 702 thrpt 5 1,272 ± 0,439 ops/us VectorUtilBenchmark.binaryCosineScalar 1024 thrpt 5 0,846 ± 0,212 ops/us VectorUtilBenchmark.binaryCosineVector 1 thrpt 5 116,594 ± 19,379 ops/us VectorUtilBenchmark.binaryCosineVector 128 thrpt 5 23,696 ± 0,971 ops/us VectorUtilBenchmark.binaryCosineVector 207 thrpt 5 15,562 ± 1,261 ops/us VectorUtilBenchmark.binaryCosineVector 256 thrpt 5 15,580 ± 0,818 ops/us VectorUtilBenchmark.binaryCosineVector 300 thrpt 5 10,589 ± 9,402 ops/us VectorUtilBenchmark.binaryCosineVector 512 thrpt 5 8,864 ± 1,360 ops/us VectorUtilBenchmark.binaryCosineVector 702 thrpt 5 5,632 ± 0,152 ops/us VectorUtilBenchmark.binaryCosineVector 1024 thrpt 5 4,033 ± 0,966 ops/us VectorUtilBenchmark.binaryDotProductScalar 1 thrpt 5 276,530 ± 38,515 ops/us VectorUtilBenchmark.binaryDotProductScalar 128 thrpt 5 13,190 ± 0,303 ops/us VectorUtilBenchmark.binaryDotProductScalar 207 thrpt 5 8,590 ± 0,332 ops/us VectorUtilBenchmark.binaryDotProductScalar 256 thrpt 5 6,982 ± 0,256 ops/us VectorUtilBenchmark.binaryDotProductScalar 300 thrpt 5 6,007 ± 0,243 ops/us VectorUtilBenchmark.binaryDotProductScalar 512 thrpt 5 3,463 ± 0,433 ops/us VectorUtilBenchmark.binaryDotProductScalar 702 thrpt 5 2,602 ± 0,063 ops/us VectorUtilBenchmark.binaryDotProductScalar 1024 thrpt 5 1,755 ± 0,073 ops/us VectorUtilBenchmark.binaryDotProductVector 1 thrpt 5 154,801 ± 45,755 ops/us VectorUtilBenchmark.binaryDotProductVector 128 thrpt 5 50,450 ± 10,559 ops/us VectorUtilBenchmark.binaryDotProductVector 207 thrpt 5 30,656 ± 1,151 ops/us VectorUtilBenchmark.binaryDotProductVector 256 thrpt 5 30,256 ± 1,618 ops/us VectorUtilBenchmark.binaryDotProductVector 300 thrpt 5 23,890 ± 6,478 ops/us VectorUtilBenchmark.binaryDotProductVector 512 thrpt 5 16,696 ± 0,571 ops/us VectorUtilBenchmark.binaryDotProductVector 702 thrpt 5 11,718 ± 0,265 ops/us VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 5 8,760 ± 0,194 ops/us VectorUtilBenchmark.binarySquareScalar 1 thrpt 5 251,177 ± 83,185 ops/us VectorUtilBenchmark.binarySquareScalar 128 thrpt 5 11,902 ± 1,279 ops/us VectorUtilBenchmark.binarySquareScalar 207 thrpt 5 7,244 ± 2,344 ops/us VectorUtilBenchmark.binarySquareScalar 256 thrpt 5 5,975 ± 1,489 ops/us VectorUtilBenchmark.binarySquareScalar 300 thrpt 5 5,089 ± 0,309 ops/us VectorUtilBenchmark.binarySquareScalar 512 thrpt 5 3,139 ± 0,205 ops/us VectorUtilBenchmark.binarySquareScalar 702 thrpt 5 2,325 ± 0,200 ops/us VectorUtilBenchmark.binarySquareScalar 1024 thrpt 5 1,586 ± 0,032 ops/us VectorUtilBenchmark.binarySquareVector 1 thrpt 5 179,243 ± 12,767 ops/us VectorUtilBenchmark.binarySquareVector 128 thrpt 5 41,748 ± 1,302 ops/us VectorUtilBenchmark.binarySquareVector 207 thrpt 5 25,865 ± 0,939 ops/us VectorUtilBenchmark.binarySquareVector 256 thrpt 5 25,354 ± 1,070 ops/us VectorUtilBenchmark.binarySquareVector 300 thrpt 5 20,371 ± 0,653 ops/us VectorUtilBenchmark.binarySquareVector 512 thrpt 5 14,283 ± 0,631 ops/us VectorUtilBenchmark.binarySquareVector 702 thrpt 5 9,980 ± 0,344 ops/us VectorUtilBenchmark.binarySquareVector 1024 thrpt 5 6,684 ± 3,338 ops/us VectorUtilBenchmark.floatCosineScalar 1 thrpt 5 190,660 ± 5,937 ops/us VectorUtilBenchmark.floatCosineScalar 128 thrpt 5 7,029 ± 0,202 ops/us VectorUtilBenchmark.floatCosineScalar 207 thrpt 5 4,424 ± 0,116 ops/us VectorUtilBenchmark.floatCosineScalar 256 thrpt 5 3,473 ± 1,401 ops/us VectorUtilBenchmark.floatCosineScalar 300 thrpt 5 3,144 ± 0,048 ops/us VectorUtilBenchmark.floatCosineScalar 512 thrpt 5 1,653 ± 0,030 ops/us VectorUtilBenchmark.floatCosineScalar 702 thrpt 5 1,210 ± 0,037 ops/us VectorUtilBenchmark.floatCosineScalar 1024 thrpt 5 0,795 ± 0,049 ops/us VectorUtilBenchmark.floatCosineVector 1 thrpt 5 132,462 ± 7,447 ops/us VectorUtilBenchmark.floatCosineVector 128 thrpt 5 26,174 ± 0,621 ops/us VectorUtilBenchmark.floatCosineVector 207 thrpt 5 15,948 ± 2,942 ops/us VectorUtilBenchmark.floatCosineVector 256 thrpt 5 17,445 ± 2,915 ops/us VectorUtilBenchmark.floatCosineVector 300 thrpt 5 14,293 ± 1,994 ops/us VectorUtilBenchmark.floatCosineVector 512 thrpt 5 11,711 ± 0,523 ops/us VectorUtilBenchmark.floatCosineVector 702 thrpt 5 8,415 ± 0,228 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 5 6,859 ± 0,244 ops/us VectorUtilBenchmark.floatDotProductScalar 1 thrpt 5 211,648 ± 9,064 ops/us VectorUtilBenchmark.floatDotProductScalar 128 thrpt 5 16,845 ± 4,619 ops/us VectorUtilBenchmark.floatDotProductScalar 207 thrpt 5 11,864 ± 0,254 ops/us VectorUtilBenchmark.floatDotProductScalar 256 thrpt 5 9,471 ± 0,368 ops/us VectorUtilBenchmark.floatDotProductScalar 300 thrpt 5 8,323 ± 0,262 ops/us VectorUtilBenchmark.floatDotProductScalar 512 thrpt 5 4,850 ± 0,147 ops/us VectorUtilBenchmark.floatDotProductScalar 702 thrpt 5 3,622 ± 0,182 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 5 2,412 ± 0,188 ops/us VectorUtilBenchmark.floatDotProductVector 1 thrpt 5 186,701 ± 9,767 ops/us VectorUtilBenchmark.floatDotProductVector 128 thrpt 5 53,626 ± 12,555 ops/us VectorUtilBenchmark.floatDotProductVector 207 thrpt 5 31,024 ± 0,692 ops/us VectorUtilBenchmark.floatDotProductVector 256 thrpt 5 38,246 ± 0,693 ops/us VectorUtilBenchmark.floatDotProductVector 300 thrpt 5 26,674 ± 0,843 ops/us VectorUtilBenchmark.floatDotProductVector 512 thrpt 5 22,742 ± 0,608 ops/us VectorUtilBenchmark.floatDotProductVector 702 thrpt 5 15,759 ± 0,346 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 5 13,512 ± 0,549 ops/us VectorUtilBenchmark.floatSquareScalar 1 thrpt 5 306,048 ± 3,798 ops/us VectorUtilBenchmark.floatSquareScalar 128 thrpt 5 13,377 ± 0,446 ops/us VectorUtilBenchmark.floatSquareScalar 207 thrpt 5 7,939 ± 0,447 ops/us VectorUtilBenchmark.floatSquareScalar 256 thrpt 5 6,760 ± 0,396 ops/us VectorUtilBenchmark.floatSquareScalar 300 thrpt 5 5,415 ± 0,220 ops/us VectorUtilBenchmark.floatSquareScalar 512 thrpt 5 3,261 ± 0,956 ops/us VectorUtilBenchmark.floatSquareScalar 702 thrpt 5 2,394 ± 0,072 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 5 1,686 ± 0,043 ops/us VectorUtilBenchmark.floatSquareVector 1 thrpt 5 182,811 ± 27,177 ops/us VectorUtilBenchmark.floatSquareVector 128 thrpt 5 53,693 ± 2,607 ops/us VectorUtilBenchmark.floatSquareVector 207 thrpt 5 26,490 ± 0,620 ops/us VectorUtilBenchmark.floatSquareVector 256 thrpt 5 29,780 ± 2,791 ops/us VectorUtilBenchmark.floatSquareVector 300 thrpt 5 22,809 ± 0,417 ops/us VectorUtilBenchmark.floatSquareVector 512 thrpt 5 18,750 ± 1,414 ops/us VectorUtilBenchmark.floatSquareVector 702 thrpt 5 13,374 ± 0,513 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 5 11,284 ± 0,261 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org