msokolov opened a new pull request, #13910: URL: https://github.com/apache/lucene/pull/13910
While exploring some recall-related failures in another PR I went looking for a unit test that checks HNSW/KNN recall and couldn't find any. I think we used to have one but maybe we removed it because it was flaky? But we really do need such a test since it is possible to make changes that preserve all the formal properties of the codecs and queries yet destroy recall. I thought if we can create such a test with known data and vectors it would be more predictable than one using random data, so I made one, and it uncovered a couple of bugs: In Lucene90HnswVectorsReader we messed up (removed) ord-to-doc mappings so we were returning vector ords instead of docids in search results. I guess this would have totally borked back-compat for Lucene90 indexes. Probably there are none in the wild, and this was never noticed? In Lucene91RWFormat (used only for back-compat testing) we messed up diversity check so we were producing bad graphs. This PR fixes these things and adds the new test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org