msokolov commented on code in PR #13566:
URL: https://github.com/apache/lucene/pull/13566#discussion_r1676154393


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java:
##########
@@ -30,46 +30,83 @@
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.LeafReaderContext;
 import org.apache.lucene.util.FixedBitSet;
-import org.apache.lucene.util.hnsw.HnswGraph;
 
 /** Utilities for use in tests involving HNSW graphs */
-public class HnswTestUtil {
+public class HnswUtil {
 
   /**
    * Returns true iff level 0 of the graph is fully connected - that is every 
node is reachable from
    * any entry point.
    */
   public static boolean isFullyConnected(HnswGraph knnValues) throws 
IOException {
-    return componentSizes(knnValues).size() < 2;
+    for (int level = 0; level < knnValues.numLevels(); level++) {
+      if (components(knnValues, level).size() > 1) {
+        return false;
+      }
+    }
+    return true;
   }
 
   /**
    * Returns the sizes of the distinct graph components on level 0. If the 
graph is fully-connected
    * there will only be a single component. If the graph is empty, the 
returned list will be empty.
    */
   public static List<Integer> componentSizes(HnswGraph hnsw) throws 
IOException {
-    List<Integer> sizes = new ArrayList<>();
+    return componentSizes(hnsw, 0);
+  }
+
+  /**
+   * Returns the sizes of the distinct graph components on the given level. If 
the graph is
+   * fully-connected there will only be a single component. If the graph is 
empty, the returned list
+   * will be empty.
+   */
+  public static List<Integer> componentSizes(HnswGraph hnsw, int level) throws 
IOException {
+    return components(hnsw, level).stream().map(Component::size).toList();
+  }
+
+  // Finds all connected components of the graph
+  static List<Component> components(HnswGraph hnsw, int level) throws 
IOException {

Review Comment:
   After reading that I realized that this whole notion of components really is 
only well-defined for undirected graphs. Ours are directed, so this isn't 
really the same thing after all. Actually it raises a question about this whole 
business because we can make it so every node is reachable from node 0, but 
that still doesn't guarantee that every node is reachable from every other 
node. I still think this is progress, but we might want to think about how to 
measure this more global property?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to