[ https://issues.apache.org/jira/browse/LUCENE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273019#comment-17273019 ]
ASF subversion and git services commented on LUCENE-9695: --------------------------------------------------------- Commit 38ec2602cee395029497af992be7a217f97229d3 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=38ec260 ] LUCENE-9695: don't merge deleted vectors (#2239) > Don't include deleted documents when merging vectors > ---------------------------------------------------- > > Key: LUCENE-9695 > URL: https://issues.apache.org/jira/browse/LUCENE-9695 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael Sokolov > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > While testing HNSW searches with multi-segment indexes, all kinds of strange > things were happening; recall performance was radically different for a > force-merged multi-segment index than for the same index built as a single > segment. Most testing I've done to date has been with single-segment indexes, > shame on me. > One issue is that when merging we iterate over all the vectors from 0 .. > size-1. But this size was being calculated without taking deletions into > account, and this caused deleted vectors to be included in the graph leading > to exceptions and weird inconsistencies. > The other issue has to do with aliasing in the diverse neighbor selection > graph construction heuristic introduced recently. Sometimes vectors to be > compared would be drawn from the same VectorValues, but this is a no-no since > they are then the same vector (the first one will be overwritten when the > second one is fetched). This leads to poor results, but not errors per se, > but the results also became unpredictable in a way that causes the test > written to reproduce the first issue to fail. Thus I'll include both fixes > together. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org