rmuir opened a new pull request, #12632:
URL: https://github.com/apache/lucene/pull/12632

   We can get these functions closer to optimal by just directly converting to 
32-bits + `vpmulld`. 
   
   See https://stackoverflow.com/a/69848057 for the motivation.
   
   You can reproduce my results with a `java -jar target/vectorbench.jar -p 
size=1024 Binary`. See https://github.com/rmuir/vectorbench for instructions. 
There is a README now! I don't touch java that much so it makes it easier on me.
   
   Skylake (256-bit):
   
   ```
   Benchmark                                   (size)   Mode  Cnt  Score   
Error   Units
   BinaryCosineBenchmark.cosineDistanceNew       1024  thrpt    5  3.252 ± 
1.457  ops/us
   BinaryCosineBenchmark.cosineDistanceNewNew    1024  thrpt    5  3.746 ± 
0.069  ops/us
   BinaryDotProductBenchmark.dotProductNew       1024  thrpt    5  7.080 ± 
0.121  ops/us
   BinaryDotProductBenchmark.dotProductNewNew    1024  thrpt    5  8.329 ± 
0.288  ops/us
   BinarySquareBenchmark.squareDistanceNew       1024  thrpt    5  6.208 ± 
0.800  ops/us
   BinarySquareBenchmark.squareDistanceNewNew    1024  thrpt    5  7.285 ± 
0.629  ops/us
   ```
   
   I'd appreciate if someone could test AVX-512. This codepath only impacts 
256bit+ vectors so it won't change anything for your mac with 128bit vectors. I 
will look into that one again separately: i know how to speed it up, but I 
don't want things ugly. I like this change because it makes the code a little 
simpler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to