[ 
https://issues.apache.org/jira/browse/LUCENE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273019#comment-17273019
 ] 

ASF subversion and git services commented on LUCENE-9695:
---------------------------------------------------------

Commit 38ec2602cee395029497af992be7a217f97229d3 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=38ec260 ]

LUCENE-9695: don't merge deleted vectors (#2239)



> Don't include deleted documents when merging vectors
> ----------------------------------------------------
>
>                 Key: LUCENE-9695
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9695
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> While testing HNSW searches with multi-segment indexes, all kinds of strange 
> things were happening; recall performance was radically different for a 
> force-merged multi-segment index than for the same index built as a single 
> segment. Most testing I've done to date has been with single-segment indexes, 
> shame on me.
> One issue is that when merging we iterate over all the vectors from 0 .. 
> size-1. But this size was being calculated without taking deletions into 
> account, and this caused deleted vectors to be included in the graph leading 
> to exceptions and weird inconsistencies.
> The other issue has to do with aliasing in the diverse neighbor selection 
> graph construction heuristic introduced recently. Sometimes vectors to be 
> compared would be drawn from the same VectorValues, but this is a no-no since 
> they are then the same vector (the first one will be overwritten when the 
> second one is fetched). This leads to poor results, but not errors per se, 
> but the results also became unpredictable in a way that causes the test 
> written to reproduce the first issue to fail. Thus I'll include both fixes 
> together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to