msokolov commented on issue #13687:
URL: https://github.com/apache/lucene/issues/13687#issuecomment-2313000464

   In graph theory "strongly connected" means every node in the graph can be 
reached from every other node. In a bidirectional (undirected) graph its the 
same as "connected". But in a directed graph it's possible that every node can 
be reached from some node (it is said to be "rooted" in that node) but not from 
every node. HNSW graphs are initially constructed as bidirectional, but because 
we impose a maximum number of outbound connections for each node (the "M" 
parameter), and use a pruning strategy to remove some nodes as we exceed that 
limit, the graph ends up only being partially bidirectional and theoretically 
has to be treated as a directed graph.
   
   HNSW "graphs" are composed of multiple layers, each of which is a graph in 
its own right.
   
   Searches on HNSW graphs start from a single entry point on the highest 
layer, find the nearest node to the query on the next level (starting from the 
entry point) and use that result as the entry point for the next level down, 
until the bottommost level, where the top K results are retained.
   
   The connectivity metric we now have checks for each level whether every node 
is reachable from at least one of the nodes on the next higher level, so it's 
more like checking that the entire graph is "rooted" in the entry point. In 
theory this says that any node can potentially be found as a result, a 
desirable property I think.
   
   However it seems possible that there can be areas of the level 0 graph that 
are cut off from other areas (starting from points on that level)? And if we 
somehow make a bad choice on the prior level, we could get in a situation where 
some results can't be found. I don't know if this occurs in practice or should 
be a matter for concern. I guess there could be some exploration of whether 
there is any such problem, and if there is, how to address it. Maybe it would 
involve some stroinger connectivity test, not sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to