jpountz commented on PR #873: URL: https://github.com/apache/lucene/pull/873#issuecomment-1129949546
Good question. In my opinion, the part that is important is that the TopDocs returned by `KnnVectorsReader#search` are ordered by score then doc ID. Otherwise logic like `TopDocs#merge` would get very confused - it assumes top docs to come in descending score order, then ascending doc ID order. So we could potentially leave most of the existing logic untouched and re-sort after the HNSW search to make sure the order meets `TopDocs`'s expectations. That said, even though we can't have strong guarantees, I feel like tie-breaking by doc ID as part of the HNSW search still reduces surprises. E.g. today, in the case when there are lots of ties, if you run a first search with k=10 and then a second one with k=20, many of the new hits would get prepended rather than appended to the top hits. I understand there's no guarantee either way, but this would still be very surprising. I feel less strongly about this part so I'm happy to follow the re-sorting approach if tie-breaking by doc ID as part of the HNSW search proves controversial. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org