jpountz commented on PR #14167: URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616390109
> I don't know of another query where multiple passes over a static dataset can return different docs. Currently, this does not happen because Lucene only enables so-called "rank-safe" optimizations to top-k query processing for lexical search. So regardless of how search threads race with one another, `Top(ScoreDoc|Field)CollectorManager` are guaranteed to always return the same (correct) hits. However, would we enable "rank-unsafe" optimizations (e.g. https://github.com/apache/lucene/pull/12446), we would be observing the same issue that you are seeing here. I suspect that users may indeed struggle with this behavior, e.g. if running the same query multiple times on an e-commerce website doesn't return the same hits every time. It probably makes it hard to write integration tests as well. I believe that the Anserini IR toolkit wouldn't be happy either given how much it cares about reproducibility. The direction that you are suggesting makes sense to me, I have no idea how hard it is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org