[ 
https://issues.apache.org/jira/browse/LUCENE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani updated LUCENE-9695:
-------------------------------------
    Attachment: Screen Shot 2021-10-05 at 9.50.53 AM.png

> Don't include deleted documents when merging vectors
> ----------------------------------------------------
>
>                 Key: LUCENE-9695
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9695
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>         Attachments: Screen Shot 2021-10-05 at 9.50.53 AM.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> While testing HNSW searches with multi-segment indexes, all kinds of strange 
> things were happening; recall performance was radically different for a 
> force-merged multi-segment index than for the same index built as a single 
> segment. Most testing I've done to date has been with single-segment indexes, 
> shame on me.
> One issue is that when merging we iterate over all the vectors from 0 .. 
> size-1. But this size was being calculated without taking deletions into 
> account, and this caused deleted vectors to be included in the graph leading 
> to exceptions and weird inconsistencies.
> The other issue has to do with aliasing in the diverse neighbor selection 
> graph construction heuristic introduced recently. Sometimes vectors to be 
> compared would be drawn from the same VectorValues, but this is a no-no since 
> they are then the same vector (the first one will be overwritten when the 
> second one is fetched). This leads to poor results, but not errors per se, 
> but the results also became unpredictable in a way that causes the test 
> written to reproduce the first issue to fail. Thus I'll include both fixes 
> together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to