ChrisHegarty opened a new pull request, #14980:
URL: https://github.com/apache/lucene/pull/14980

   This commit adds support for bulk scoring vectors, via a 
`RandomVectorScorer::scoreBulk` method.
   
   The POC currently scores 4 vectors against the query vector at a time, just 
as an experiment to see how much can be gained with a minimal change.
   
   
   Initial results from the micro benchmarks shows good potential improvement. 
The benchmark creates a flat vector index with 128,000 float32 vectors with 
1024 dimensions (~500MB). And times how long it takes to scores 20,000 random 
vectors against a query vector (lower times are better)
     
   ```
   Benchmark                                            (size)  Mode  Cnt  
Score   Error  Units
   VectorScorerFloat32Benchmark.dotProductDefault         1024  avgt   15  
8.505 ± 0.256  ms/op
   VectorScorerFloat32Benchmark.dotProductNewBulkScore    1024  avgt   15  
3.717 ± 0.158  ms/op
   VectorScorerFloat32Benchmark.dotProductNewScorer       1024  avgt   15  
7.287 ± 0.181  ms/op
   ```
   
   Notes:
   * the implementation is quite crude for now, just trying to find the "sweet 
spot".
   * The bulk scorer just does 4 vectors at time, since the implementation in 
Lucene is more straightforward, but this could be adjusted.
   * Initial Luceneutil benchmarks show some positive results, but not as much 
as you would expect. I don't yet know why! ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to