rmuir commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2513295791
I applied and tested the same approach with the other 2 functions too. cosine was already underweight: it is only unrolled twice due to complexity of the mathematical formula, but it keeps the floats consistent. we could tidy up the binary ones in similar fashion as a followup for more consistency, but since jvm can already unroll the integer math, they arent unrolled and i expect they are already under limit. microbenchmarks seem happy but I assume the real gains are from more macrobenchmark where the inlining can help. ``` Before: Benchmark (size) Mode Cnt Score Error Units "body" size VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.216 ± 0.026 ops/us 345 bytes VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 12.466 ± 0.100 ops/us 355 bytes VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.986 ± 0.074 ops/us 400 bytes After: Benchmark (size) Mode Cnt Score Error Units "body" size VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.377 ± 0.040 ops/us 320 bytes VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 12.917 ± 0.113 ops/us 302 bytes VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.965 ± 0.089 ops/us 347 bytes ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org