msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2692810655
Also, I forgot about this comment: > Would it make sense to cap perLeafTopK by the original k? I think k is double on every iteration, and perLeafTopK can theoretically go over the original k, which is excessive. I added an additional cap on this, but then realized we are already implicitly imposing such a limit here: ``` if (perLeaf.scoreDocs.length > 0 && perLeaf.scoreDocs[perLeaf.scoreDocs.length - 1].score >= minTopKScore && perLeafTopKCalculation(kInLoop / 2, ctx.reader().maxDoc() / (float) reader.maxDoc()) <= k + 1) { ``` by the way another thing we might want to try is relaxing this reentry check a bit by looking at the second- or third-worst per-leaf score, because in theory `lambda` created a buffer that should cause leaves to collect deeper than the best top K. This could enable this per-leaf strategy to outperform the global fanout? Anyway it's easy to try -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org