kaivalnp opened a new pull request, #14874: URL: https://github.com/apache/lucene/pull/14874
### Description In #14863, I noticed a regression in computing vector scores if the query vector was present on heap (i.e. allocated using `MemorySegment.ofArray`) but the doc was on RAM We do have a JMH [benchmark](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorScorerBenchmark.java#L91) for testing the off-heap scoring performance -- but it only tests the indexing case (using an [`UpdateableRandomVectorScorer`](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/core/src/java/org/apache/lucene/util/hnsw/UpdateableRandomVectorScorer.java#L29C18-L29C46)), when both vectors are on RAM (mmap-ed input) I also added benchmark functions to demonstrate the searching case (using a [`RandomVectorScorer`](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorer.java#L28)) `main`: ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerBenchmark.binaryDotProductIndexingDefault 1024 thrpt 15 2.299 ± 0.017 ops/us VectorScorerBenchmark.binaryDotProductIndexingMemSeg 1024 thrpt 15 7.533 ± 0.107 ops/us VectorScorerBenchmark.binaryDotProductSearchingDefault 1024 thrpt 15 2.332 ± 0.017 ops/us VectorScorerBenchmark.binaryDotProductSearchingMemSeg 1024 thrpt 15 2.069 ± 0.069 ops/us ``` This PR: ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerBenchmark.binaryDotProductIndexingDefault 1024 thrpt 15 2.295 ± 0.012 ops/us VectorScorerBenchmark.binaryDotProductIndexingMemSeg 1024 thrpt 15 7.551 ± 0.027 ops/us VectorScorerBenchmark.binaryDotProductSearchingDefault 1024 thrpt 15 2.341 ± 0.019 ops/us VectorScorerBenchmark.binaryDotProductSearchingMemSeg 1024 thrpt 15 4.241 ± 0.064 ops/us ``` We see \~2x improvement in vector scoring time! I'm not sure if this is specific to my machine, or something in general.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org