tveasey commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1873862125

   IMO we shouldn't focus too much on recall since the greediness of 
non-competitive search allows us to tune this. My main concern is does 
contention on the queue updates cause slow down. This aside, I think the queue 
is strictly better.
   
   The search might wind up visiting fewer vertices for min score sharing, 
because of earlier decisions might mean it by chance gets transiently better 
bounds, but this should be low probability particularly when the search has to 
visit many vertices. And indeed these cases are where we see big wins from 
using a queue.
   
   There appears to be some evidence of contention. This is suggested by 
looking at the runtime vs expected runtime from vertices visited, e.g.
   
   | scenario | QPS(score) / QPS(queue) | Visited(queue) / Visited(score) |
   | --- | --- | --- |
   | n=10M, dim=100, k = 100, fo = 900 | 0.83 | 0.65 |
   | n=10M, dim=768, k = 100, fo = 900 | 0.76 | 0.68 |
   
   Note that the direction of this effect is consistent, but the size is not 
(fo = 900 shows the largest effect). However, all that said we still get 
significant wins in performance, so my vote would be to use the queue and work 
on strategies for reducing contention, there are various ideas we had for ways 
to achieve this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to