rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556403183
I pushed a new benchmark to https://github.com/rmuir/vectorbench for the binary dot product. Basically this has to act like: ``` int sum = 0; for (...) { short product = (short) (a[i] * b[i]); sum += (int) product; } ``` So it is tricky to do with totally generic implementation (just using SPECIES_PREFERRED). for avx256, it means you read byte vector of length 32 and then work each half as short (2 short vectors of length 16), and then the same thing again for each half as int (4 int vectors of length 8). This generic approach only gives me a 2x speedup which is a little disappointing. but this is a stupid approach if you have 256-bit vectors. You can just use ByteVector.SPECIES_64, ShortVector.SPECIES_128, and IntVector.SPECIES_256 and the whole thing is much faster. on my skylake (has avx 256 and gets the optimized 256-bit impl) ``` Benchmark (size) Mode Cnt Score Error Units BinaryDotProductBenchmark.dotProductNew 1 thrpt 5 159.476 ± 8.177 ops/us BinaryDotProductBenchmark.dotProductNew 128 thrpt 5 41.759 ± 0.267 ops/us BinaryDotProductBenchmark.dotProductNew 207 thrpt 5 25.094 ± 0.107 ops/us BinaryDotProductBenchmark.dotProductNew 256 thrpt 5 24.841 ± 0.124 ops/us BinaryDotProductBenchmark.dotProductNew 300 thrpt 5 19.624 ± 0.891 ops/us BinaryDotProductBenchmark.dotProductNew 512 thrpt 5 13.763 ± 0.171 ops/us BinaryDotProductBenchmark.dotProductNew 702 thrpt 5 9.792 ± 0.388 ops/us BinaryDotProductBenchmark.dotProductNew 1024 thrpt 5 6.878 ± 0.834 ops/us BinaryDotProductBenchmark.dotProductOld 1 thrpt 5 160.423 ± 6.845 ops/us BinaryDotProductBenchmark.dotProductOld 128 thrpt 5 13.300 ± 0.159 ops/us BinaryDotProductBenchmark.dotProductOld 207 thrpt 5 8.678 ± 0.293 ops/us BinaryDotProductBenchmark.dotProductOld 256 thrpt 5 6.892 ± 0.331 ops/us BinaryDotProductBenchmark.dotProductOld 300 thrpt 5 6.008 ± 0.438 ops/us BinaryDotProductBenchmark.dotProductOld 512 thrpt 5 3.613 ± 0.192 ops/us BinaryDotProductBenchmark.dotProductOld 702 thrpt 5 2.710 ± 0.167 ops/us BinaryDotProductBenchmark.dotProductOld 1024 thrpt 5 1.825 ± 0.125 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org