ChrisHegarty opened a new pull request, #14980: URL: https://github.com/apache/lucene/pull/14980
This commit adds support for bulk scoring vectors, via a `RandomVectorScorer::scoreBulk` method. The POC currently scores 4 vectors against the query vector at a time, just as an experiment to see how much can be gained with a minimal change. Initial results from the micro benchmarks shows good potential improvement. The benchmark creates a flat vector index with 128,000 float32 vectors with 1024 dimensions (~500MB). And times how long it takes to scores 20,000 random vectors against a query vector (lower times are better) ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerFloat32Benchmark.dotProductDefault 1024 avgt 15 8.505 ± 0.256 ms/op VectorScorerFloat32Benchmark.dotProductNewBulkScore 1024 avgt 15 3.717 ± 0.158 ms/op VectorScorerFloat32Benchmark.dotProductNewScorer 1024 avgt 15 7.287 ± 0.181 ms/op ``` Notes: * the implementation is quite crude for now, just trying to find the "sweet spot". * The bulk scorer just does 4 vectors at time, since the implementation in Lucene is more straightforward, but this could be adjusted. * Initial Luceneutil benchmarks show some positive results, but not as much as you would expect. I don't yet know why! ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org