Re: [PR] Add knn result consistency test [lucene]

via GitHub Mon, 27 Jan 2025 09:07:30 -0800


jpountz commented on PR #14167:
URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616390109


   > I don't know of another query where multiple passes over a static dataset 
can return different docs.
   
   Currently, this does not happen because Lucene only enables so-called 
"rank-safe" optimizations to top-k query processing for lexical search. So 
regardless of how search threads race with one another, 
`Top(ScoreDoc|Field)CollectorManager` are guaranteed to always return the same 
(correct) hits. However, would we enable "rank-unsafe" optimizations (e.g. 
https://github.com/apache/lucene/pull/12446), we would be observing the same 
issue that you are seeing here.
   
   I suspect that users may indeed struggle with this behavior, e.g. if running 
the same query multiple times on an e-commerce website doesn't return the same 
hits every time. It probably makes it hard to write integration tests as well. 
I believe that the Anserini IR toolkit wouldn't be happy either given how much 
it cares about reproducibility. The direction that you are suggesting makes 
sense to me, I have no idea how hard it is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add knn result consistency test [lucene]

Reply via email to