jmazanec15 commented on issue #12533: URL: https://github.com/apache/lucene/issues/12533#issuecomment-1703089850
> 1. We could remove the deleted node from the graph, fully connected all its neighbours, and do diverse check on those neighbors to remove extra links. In case the deleted node is an entry node, we can insert the closest node of the deleted node upto where the deleted node existed. What do you mean by fully connect its neighbors? Would this mean basically figure out the to be deleted nodes in-edges and reinsert them into the graph using normal edge selection strategy excluding the deleted nodes to "patch" the broken connections? We looked into this a little bit recently, but the number of reinserts grows pretty fast. It might be promising, though, to start finding replacement neighbors from the neighbor that is being removed (as opposed to starting from the global entry point). I think with this approach we would need to figure out a way to avoid quality drift after the graph has been manipulated in such a way over several generations - edge selection strategy is different from building the graph. For instance, refinement overtime may mean that the long distance hops neighbors added on early would start to disappear. Would the diversity check help in this case? Also, I think at a certain point, it will be better to just rebuild the graph from scratch, sugges ting a threshold might need to be selected. > 2. We tolerate certain amount of deletions (like 10% ~ 20%) inside HNSW graph and just use them as connections. There was some discussion around this in hnswlib: https://github.com/nmslib/hnswlib/issues/4#issuecomment-678315156. In practice, this probably would work well - but not really sure how to choose the correct number of deletions. But agree with @mbrette - might be good to take a hybrid approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org