[ https://issues.apache.org/jira/browse/LUCENE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julie Tibshirani updated LUCENE-9695: ------------------------------------- Attachment: Screen Shot 2021-10-05 at 9.50.53 AM.png > Don't include deleted documents when merging vectors > ---------------------------------------------------- > > Key: LUCENE-9695 > URL: https://issues.apache.org/jira/browse/LUCENE-9695 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael Sokolov > Priority: Major > Attachments: Screen Shot 2021-10-05 at 9.50.53 AM.png > > Time Spent: 50m > Remaining Estimate: 0h > > While testing HNSW searches with multi-segment indexes, all kinds of strange > things were happening; recall performance was radically different for a > force-merged multi-segment index than for the same index built as a single > segment. Most testing I've done to date has been with single-segment indexes, > shame on me. > One issue is that when merging we iterate over all the vectors from 0 .. > size-1. But this size was being calculated without taking deletions into > account, and this caused deleted vectors to be included in the graph leading > to exceptions and weird inconsistencies. > The other issue has to do with aliasing in the diverse neighbor selection > graph construction heuristic introduced recently. Sometimes vectors to be > compared would be drawn from the same VectorValues, but this is a no-no since > they are then the same vector (the first one will be overwritten when the > second one is fetched). This leads to poor results, but not errors per se, > but the results also became unpredictable in a way that causes the test > written to reproduce the first issue to fail. Thus I'll include both fixes > together. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org