ChrisHegarty commented on PR #15037: URL: https://github.com/apache/lucene/pull/15037#issuecomment-3164824371
Dumping some jmh results. Summary: 1. Bulk scoring is ~2x faster on in all results. 2. Off-heap scoring of a single vector is anywhere up to 50% better. linux-arm: c7g.4xlarge, Graviton 3, Neoverse-V1, preferredBitSize=256 ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerFloat32Benchmark.cosineDefault 1024 avgt 15 10.805 ± 0.285 ms/op VectorScorerFloat32Benchmark.cosineDefaultBulk 1024 avgt 15 10.423 ± 0.124 ms/op VectorScorerFloat32Benchmark.cosineOptBulkScore 1024 avgt 15 5.888 ± 0.151 ms/op VectorScorerFloat32Benchmark.cosineOptScorer 1024 avgt 15 9.601 ± 0.633 ms/op VectorScorerFloat32Benchmark.dotProductDefault 1024 avgt 15 10.484 ± 0.201 ms/op VectorScorerFloat32Benchmark.dotProductDefaultBulk 1024 avgt 15 10.408 ± 0.194 ms/op VectorScorerFloat32Benchmark.dotProductOptBulkScore 1024 avgt 15 5.856 ± 0.166 ms/op VectorScorerFloat32Benchmark.dotProductOptScorer 1024 avgt 15 7.214 ± 0.281 ms/op VectorScorerFloat32Benchmark.euclideanDefault 1024 avgt 15 10.572 ± 0.459 ms/op VectorScorerFloat32Benchmark.euclideanDefaultBulk 1024 avgt 15 10.692 ± 0.335 ms/op VectorScorerFloat32Benchmark.euclideanOptBulkScore 1024 avgt 15 5.797 ± 0.279 ms/op VectorScorerFloat32Benchmark.euclideanOptScorer 1024 avgt 15 8.324 ± 0.504 ms/op VectorScorerFloat32Benchmark.mipDefault 1024 avgt 15 10.664 ± 0.264 ms/op VectorScorerFloat32Benchmark.mipDefaultBulk 1024 avgt 15 10.566 ± 0.244 ms/op VectorScorerFloat32Benchmark.mipOptBulkScore 1024 avgt 15 5.820 ± 0.103 ms/op VectorScorerFloat32Benchmark.mipOptScorer 1024 avgt 15 7.420 ± 0.615 ms/op ``` linux-x64: m6i.2xlarge, Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, preferredBitSize=512 ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerFloat32Benchmark.cosineDefault 1024 avgt 15 11.099 ± 0.225 ms/op VectorScorerFloat32Benchmark.cosineDefaultBulk 1024 avgt 15 10.563 ± 0.772 ms/op VectorScorerFloat32Benchmark.cosineOptBulkScore 1024 avgt 15 6.342 ± 0.135 ms/op VectorScorerFloat32Benchmark.cosineOptScorer 1024 avgt 15 7.302 ± 0.141 ms/op VectorScorerFloat32Benchmark.dotProductDefault 1024 avgt 15 9.935 ± 0.636 ms/op VectorScorerFloat32Benchmark.dotProductDefaultBulk 1024 avgt 15 10.014 ± 0.269 ms/op VectorScorerFloat32Benchmark.dotProductOptBulkScore 1024 avgt 15 5.627 ± 0.139 ms/op VectorScorerFloat32Benchmark.dotProductOptScorer 1024 avgt 15 5.720 ± 0.119 ms/op VectorScorerFloat32Benchmark.euclideanDefault 1024 avgt 15 10.751 ± 0.521 ms/op VectorScorerFloat32Benchmark.euclideanDefaultBulk 1024 avgt 15 10.438 ± 0.718 ms/op VectorScorerFloat32Benchmark.euclideanOptBulkScore 1024 avgt 15 6.037 ± 0.153 ms/op VectorScorerFloat32Benchmark.euclideanOptScorer 1024 avgt 15 6.485 ± 0.147 ms/op VectorScorerFloat32Benchmark.mipDefault 1024 avgt 15 9.672 ± 0.608 ms/op VectorScorerFloat32Benchmark.mipDefaultBulk 1024 avgt 15 9.910 ± 0.473 ms/op VectorScorerFloat32Benchmark.mipOptBulkScore 1024 avgt 15 5.722 ± 0.102 ms/op VectorScorerFloat32Benchmark.mipOptScorer 1024 avgt 15 5.899 ± 0.167 ms/op ``` linux-amd64: m6a.4xlarge, AMD EPYC 7R13 Processor, preferredBitSize=256 ``` Benchmark (size) Mode Cnt Score Error Units VectorScorerFloat32Benchmark.cosineDefault 1024 avgt 15 7.643 ± 0.144 ms/op VectorScorerFloat32Benchmark.cosineDefaultBulk 1024 avgt 15 7.709 ± 0.159 ms/op VectorScorerFloat32Benchmark.cosineOptBulkScore 1024 avgt 15 4.592 ± 0.026 ms/op VectorScorerFloat32Benchmark.cosineOptScorer 1024 avgt 15 8.149 ± 0.047 ms/op VectorScorerFloat32Benchmark.dotProductDefault 1024 avgt 15 7.396 ± 0.151 ms/op VectorScorerFloat32Benchmark.dotProductDefaultBulk 1024 avgt 15 7.460 ± 0.163 ms/op VectorScorerFloat32Benchmark.dotProductOptBulkScore 1024 avgt 15 3.915 ± 0.045 ms/op VectorScorerFloat32Benchmark.dotProductOptScorer 1024 avgt 15 5.920 ± 0.053 ms/op VectorScorerFloat32Benchmark.euclideanDefault 1024 avgt 15 7.357 ± 0.157 ms/op VectorScorerFloat32Benchmark.euclideanDefaultBulk 1024 avgt 15 7.284 ± 0.132 ms/op VectorScorerFloat32Benchmark.euclideanOptBulkScore 1024 avgt 15 4.260 ± 0.050 ms/op VectorScorerFloat32Benchmark.euclideanOptScorer 1024 avgt 15 6.747 ± 0.047 ms/op VectorScorerFloat32Benchmark.mipDefault 1024 avgt 15 7.462 ± 0.142 ms/op VectorScorerFloat32Benchmark.mipDefaultBulk 1024 avgt 15 7.347 ± 0.124 ms/op VectorScorerFloat32Benchmark.mipOptBulkScore 1024 avgt 15 3.915 ± 0.048 ms/op VectorScorerFloat32Benchmark.mipOptScorer 1024 avgt 15 5.839 ± 0.085 ms/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org