[GitHub] [lucene] jpountz commented on pull request #873: LUCENE-10397: KnnVectorQuery doesn't tie break by doc ID

GitBox Wed, 18 May 2022 05:33:31 -0700


jpountz commented on PR #873:
URL: https://github.com/apache/lucene/pull/873#issuecomment-1129949546


   Good question. In my opinion, the part that is important is that the TopDocs 
returned by `KnnVectorsReader#search` are ordered by score then doc ID. 
Otherwise logic like `TopDocs#merge` would get very confused - it assumes top 
docs to come in descending score order, then ascending doc ID order. So we 
could potentially leave most of the existing logic untouched and re-sort after 
the HNSW search to make sure the order meets `TopDocs`'s expectations.
   
   That said, even though we can't have strong guarantees, I feel like 
tie-breaking by doc ID as part of the HNSW search still reduces surprises. E.g. 
today, in the case when there are lots of ties, if you run a first search with 
k=10 and then a second one with k=20, many of the new hits would get prepended 
rather than appended to the top hits. I understand there's no guarantee either 
way, but this would still be very surprising. I feel less strongly about this 
part so I'm happy to follow the re-sorting approach if tie-breaking by doc ID 
as part of the HNSW search proves controversial.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #873: LUCENE-10397: KnnVectorQuery doesn't tie break by doc ID

Reply via email to