zhaih opened a new issue, #12533: URL: https://github.com/apache/lucene/issues/12533
### Description Currently when we're merging HNSW graphs we're able to start with an exiting graph but not inserting nodes from scratch thanks to #12050. But we have set a constraint that the init graph cannot be the one that has deletions since we do not have a good way to handle deletions within a graph without reconstructing it. And this makes the nice optimization a bit wasteful as in reality the bigger the segment the less probability it has no deletion, especially in a hybrid text-embedding index. Opening this issue to discuss how we could solve this problem or have a work around. My very naive ideas: 1. We could remove the deleted node from the graph, fully connected all its neighbours, and do diverse check on those neighbors to remove extra links. In case the deleted node is an entry node, we can insert the closest node of the deleted node upto where the deleted node existed. 2. We tolerate certain amount of deletions (like 10% ~ 20%) inside HNSW graph and just use them as connections. I personally like 1 better as for 2 we need to change the searching behavior accordingly and sounds a bit fragile. Any other ideas are welcome! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org