tveasey commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1873862125
IMO we shouldn't focus too much on recall since the greediness of non-competitive search allows us to tune this. My main concern is does contention on the queue updates cause slow down. This aside, I think the queue is strictly better. The search might wind up visiting fewer vertices for min score sharing, because of earlier decisions might mean it by chance gets transiently better bounds, but this should be low probability particularly when the search has to visit many vertices. And indeed these cases are where we see big wins from using a queue. There appears to be some evidence of contention. This is suggested by looking at the runtime vs expected runtime from vertices visited, e.g. | scenario | QPS(score) / QPS(queue) | Visited(queue) / Visited(score) | | --- | --- | --- | | n=10M, dim=100, k = 100, fo = 900 | 0.83 | 0.65 | | n=10M, dim=768, k = 100, fo = 900 | 0.76 | 0.68 | Note that the direction of this effect is consistent, but the size is not (fo = 900 shows the largest effect). However, all that said we still get significant wins in performance, so my vote would be to use the queue and work on strategies for reducing contention, there are various ideas we had for ways to achieve this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org