dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2693679959

   > I added an additional cap on this, but then realized we are already 
implicitly imposing such a limit here:
   
   @msokolov that checks if the *previous* iteration (kInLoop / 2) has exceeded 
topk, not *next* iteration. E.g if global k is 100, and the current loop has 
kInLoop = 60, then the next one would be 120 and can still exceed the global k. 
I think we should still cap the effective per-leaf k.
   
   Unrelatedly, another observation I have is that, at some iterations when 
there is only 1 or few segments left, we are still using per-leaf pro-rata as 
if we are still running with every segments. For example, in some extreme cases 
where one segments should have all the best matches, but that segment is so 
small so the per-leaf top k can only increase slowly after each pass (2 -> 4 -> 
8 -> 16). That makes me think:
   - Could we readjust the pro-rata rate, not based on the whole index, but 
based on the effective segments?
   - What if we just set the per-leaf k to the same as global k in the second 
pass, and stop at second pass? I'm curious about the overhead of re-entry vs 
the benefit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to