msokolov commented on issue #13687: URL: https://github.com/apache/lucene/issues/13687#issuecomment-2313000464
In graph theory "strongly connected" means every node in the graph can be reached from every other node. In a bidirectional (undirected) graph its the same as "connected". But in a directed graph it's possible that every node can be reached from some node (it is said to be "rooted" in that node) but not from every node. HNSW graphs are initially constructed as bidirectional, but because we impose a maximum number of outbound connections for each node (the "M" parameter), and use a pruning strategy to remove some nodes as we exceed that limit, the graph ends up only being partially bidirectional and theoretically has to be treated as a directed graph. HNSW "graphs" are composed of multiple layers, each of which is a graph in its own right. Searches on HNSW graphs start from a single entry point on the highest layer, find the nearest node to the query on the next level (starting from the entry point) and use that result as the entry point for the next level down, until the bottommost level, where the top K results are retained. The connectivity metric we now have checks for each level whether every node is reachable from at least one of the nodes on the next higher level, so it's more like checking that the entire graph is "rooted" in the entry point. In theory this says that any node can potentially be found as a result, a desirable property I think. However it seems possible that there can be areas of the level 0 graph that are cut off from other areas (starting from points on that level)? And if we somehow make a bad choice on the prior level, we could get in a situation where some results can't be found. I don't know if this occurs in practice or should be a matter for concern. I guess there could be some exploration of whether there is any such problem, and if there is, how to address it. Maybe it would involve some stroinger connectivity test, not sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org