msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2678496890
I pushed a version that re-uses scores *and* limits per-leaf topK to global topK. The former didn't make very much difference, but the latter change did improve things quite a bit. Here are some numbers from cohere/768d: ### mainline recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s index docs/s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.954 12.919 500000 50 0 64 250 no 13786 0.00 Infinity 8 1501.70 1464.844 1464.844 0.981 18.488 500000 50 50 64 250 no 20371 0.00 Infinity 8 1501.70 1464.844 1464.844 0.989 22.948 500000 50 100 64 250 no 24963 0.00 Infinity 8 1501.70 1464.844 1464.844 ### wih reused scores *and* limiting perLeafK <= K Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized visited index s index docs/s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.959 11.375 500000 50 0 64 250 no 12086 308.23 1622.15 8 1501.70 1464.844 1464.844 0.979 14.926 500000 50 50 64 250 no 16724 0.00 Infinity 8 1501.70 1464.844 1464.844 0.987 17.858 500000 50 100 64 250 no 20277 0.00 Infinity 8 1501.70 1464.844 1464.844 it would be awesome if you could produce similar comparisons for this version, @dungba88 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org