dungba88 commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2693679959
> I added an additional cap on this, but then realized we are already implicitly imposing such a limit here: @msokolov that checks if the *previous* iteration (kInLoop / 2) has exceeded topk, not *next* iteration. E.g if global k is 100, and the current loop has kInLoop = 60, then the next one would be 120 and can still exceed the global k. I think we should still cap the effective per-leaf k. Unrelatedly, another observation I have is that, at some iterations when there is only 1 or few segments left, we are still using per-leaf pro-rata as if we are still running with every segments. For example, in some extreme cases where one segments should have all the best matches, but that segment is so small so the per-leaf top k can only increase slowly after each pass (2 -> 4 -> 8 -> 16). That makes me think: - Could we readjust the pro-rata rate, not based on the whole index, but based on the effective segments? - What if we just set the per-leaf k to the same as global k in the second pass, and stop at second pass? I'm curious about the overhead of re-entry vs the benefit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org