jmazanec15 commented on issue #12533:
URL: https://github.com/apache/lucene/issues/12533#issuecomment-1703089850

   > 1. We could remove the deleted node from the graph, fully connected all 
its neighbours, and do diverse check on those neighbors to remove extra links. 
In case the deleted node is an entry node, we can insert the closest node of 
the deleted node upto where the deleted node existed.
   
   What do you mean by fully connect its neighbors? Would this mean basically 
figure out the to be deleted nodes in-edges and reinsert them into the graph 
using normal edge selection strategy excluding the deleted nodes to "patch" the 
broken connections? We looked into this a little bit recently, but the number 
of reinserts grows pretty fast. It might be promising, though, to start finding 
replacement neighbors from the neighbor that is being removed (as opposed to 
starting from the global entry point). I think with this approach we would need 
to figure out a way to avoid quality drift after the graph has been manipulated 
in such a way over several generations - edge selection strategy is different 
from building the graph. For instance, refinement overtime may mean that the 
long distance hops neighbors added on early would start to disappear. Would the 
diversity check help in this case? Also, I think at a certain point, it will be 
better to just rebuild the graph from scratch, sugges
 ting a threshold might need to be selected.
   
   > 2. We tolerate certain amount of deletions (like 10% ~ 20%) inside HNSW 
graph and just use them as connections.
   
   There was some discussion around this in hnswlib: 
https://github.com/nmslib/hnswlib/issues/4#issuecomment-678315156. In practice, 
this probably would work well - but not really sure how to choose the correct 
number of deletions. But agree with @mbrette - might be good to take a hybrid 
approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to