[GitHub] [lucene] jmazanec15 commented on issue #12533: Init HNSW merge with graph containing deleted documents

via GitHub Wed, 13 Sep 2023 09:48:46 -0700


jmazanec15 commented on issue #12533:
URL: https://github.com/apache/lucene/issues/12533#issuecomment-1717981259

Additionally, the [FreshDiskANN](https://arxiv.org/pdf/2105.09613.pdf) paper
did some work in this space. They ran a test for NSG where they iteratively
repeat the following process a certain number of cycles and track the recall:
1. delete 5% of the index
2. patch the incident nodes that were impacted via local neighborhoods
(similiar to @zhaih (1))
3. reinsert the deleted nodes
4. measure recall

They ran a similar one for HNSW where they do not patch the edges. In both
cases, they saw some degradation:

![Screenshot 2023-09-13 at 9 40 10
AM](https://github.com/apache/lucene/assets/19438237/49f2eff7-7388-4e9d-a877-9421b2c9f790).

Their intuition for this happening is because of the graphs become sparser
as this process happens, leading to less navigability. The graphs become
sparser because the pruning policy is more strict.

In their system, they do employ a similar algorithm to @zhaih (1), where
they connect the incident edges and prune based on some criteria that shows
promise.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jmazanec15 commented on issue #12533: Init HNSW merge with graph containing deleted documents

Reply via email to