[PR] Fix off-heap byte vector scoring at query time [lucene]

via GitHub Mon, 30 Jun 2025 09:02:30 -0700


kaivalnp opened a new pull request, #14874:
URL: https://github.com/apache/lucene/pull/14874


   ### Description
   
   In #14863, I noticed a regression in computing vector scores if the query 
vector was present on heap (i.e. allocated using `MemorySegment.ofArray`) but 
the doc was on RAM
   
   We do have a JMH 
[benchmark](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorScorerBenchmark.java#L91)
 for testing the off-heap scoring performance -- but it only tests the indexing 
case (using an 
[`UpdateableRandomVectorScorer`](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/core/src/java/org/apache/lucene/util/hnsw/UpdateableRandomVectorScorer.java#L29C18-L29C46)),
 when both vectors are on RAM (mmap-ed input)
   
   I also added benchmark functions to demonstrate the searching case (using a 
[`RandomVectorScorer`](https://github.com/apache/lucene/blob/f4339ee2aea65bdf1efb2a3c196c3e9a4adf9d67/lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorer.java#L28))
   
   `main`:
   ```
   Benchmark                                               (size)   Mode  Cnt  
Score   Error   Units
   VectorScorerBenchmark.binaryDotProductIndexingDefault     1024  thrpt   15  
2.299 ± 0.017  ops/us
   VectorScorerBenchmark.binaryDotProductIndexingMemSeg      1024  thrpt   15  
7.533 ± 0.107  ops/us
   VectorScorerBenchmark.binaryDotProductSearchingDefault    1024  thrpt   15  
2.332 ± 0.017  ops/us
   VectorScorerBenchmark.binaryDotProductSearchingMemSeg     1024  thrpt   15  
2.069 ± 0.069  ops/us
   ```
   
   This PR:
   ```
   Benchmark                                               (size)   Mode  Cnt  
Score   Error   Units
   VectorScorerBenchmark.binaryDotProductIndexingDefault     1024  thrpt   15  
2.295 ± 0.012  ops/us
   VectorScorerBenchmark.binaryDotProductIndexingMemSeg      1024  thrpt   15  
7.551 ± 0.027  ops/us
   VectorScorerBenchmark.binaryDotProductSearchingDefault    1024  thrpt   15  
2.341 ± 0.019  ops/us
   VectorScorerBenchmark.binaryDotProductSearchingMemSeg     1024  thrpt   15  
4.241 ± 0.064  ops/us
   ```
   
   We see \~2x improvement in vector scoring time!
   I'm not sure if this is specific to my machine, or something in general..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Fix off-heap byte vector scoring at query time [lucene]

Reply via email to