zhaih opened a new issue, #12533:
URL: https://github.com/apache/lucene/issues/12533

   ### Description
   
   Currently when we're merging HNSW graphs we're able to start with an exiting 
graph but not inserting nodes from scratch thanks to #12050. But we have set a 
constraint that the init graph cannot be the one that has deletions since we do 
not have a good way to handle deletions within a graph without reconstructing 
it. And this makes the nice optimization a bit wasteful as in reality the 
bigger the segment the less probability it has no deletion, especially in a 
hybrid text-embedding index.
   
   Opening this issue to discuss how we could solve this problem or have a work 
around.
   My very naive ideas:
   1. We could remove the deleted node from the graph, fully connected all its 
neighbours, and do diverse check on those neighbors to remove extra links. In 
case the deleted node is an entry node, we can insert the closest node of the 
deleted node upto where the deleted node existed.
   2. We tolerate certain amount of deletions (like 10% ~ 20%) inside HNSW 
graph and just use them as connections.
   
   I personally like 1 better as for 2 we need to change the searching behavior 
accordingly and sounds a bit fragile. Any other ideas are welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to