jtibshirani commented on PR #873: URL: https://github.com/apache/lucene/pull/873#issuecomment-1129427428
Sorry for jumping in late with some thoughts. Because of the approximate nature of HNSW, we are not guaranteed that the graph search will collect all documents with the same score. There could always be a document with a lower doc ID that the graph search misses, because it decided not to explore that part of the graph. So while this PR makes it more likely to return the lowest doc IDs, I still don't think we can state a helpful guarantee to the user. This makes me wonder if we should even be trying to tiebreak by doc ID during the graph search? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org