msokolov commented on code in PR #12651:
URL: https://github.com/apache/lucene/pull/12651#discussion_r1359884135


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException {
     return graph.nextNeighbor();
   }
 
+  private static int getGraphSize(HnswGraph graph) {
+    if (graph instanceof OnHeapHnswGraph) {

Review Comment:
   I see - can be a follow-up, but perhaps we introduce `capacity()` to 
`HnswGraph` so we can use it uniformly without casting



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -99,27 +122,31 @@ public void addNode(int level, int node) {
       entryNode = node;
     }
 
-    if (level > 0) {
-      // if the new node introduces a new level, add more levels to the graph,
-      // and make this node the graph's new entry point
-      if (level >= numLevels) {
-        for (int i = numLevels; i <= level; i++) {
-          graphUpperLevels.add(new HashMap<>());
-        }
-        numLevels = level + 1;
-        entryNode = node;
+    if (node >= graph.length) {
+      if (noGrowth) {
+        throw new AssertionError(

Review Comment:
   This should either be `assert node < graph.length || noGrowth == false: 
"...message..."` or else we should throw an `IllegalArgumentException` - I 
don't think we ought to be throwing `AssertionError` in production code.



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -99,27 +122,31 @@ public void addNode(int level, int node) {
       entryNode = node;
     }
 
-    if (level > 0) {
-      // if the new node introduces a new level, add more levels to the graph,
-      // and make this node the graph's new entry point
-      if (level >= numLevels) {
-        for (int i = numLevels; i <= level; i++) {
-          graphUpperLevels.add(new HashMap<>());
-        }
-        numLevels = level + 1;
-        entryNode = node;
+    if (node >= graph.length) {
+      if (noGrowth) {
+        throw new AssertionError(
+            "The graph does not expect to growth when an initial size is 
given");

Review Comment:
   syntax: growth -> grow



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -32,39 +31,54 @@
  */
 public final class OnHeapHnswGraph extends HnswGraph implements Accountable {
 
+  private static final int INIT_SIZE = 128;
+
   private int numLevels; // the current number of levels in the graph
   private int entryNode; // the current graph entry node on the top level. -1 
if not set
 
-  // Level 0 is represented as List<NeighborArray> – nodes' connections on 
level 0.
-  // Each entry in the list has the top maxConn/maxConn0 neighbors of a node. 
The nodes correspond
-  // to vectors
-  // added to HnswBuilder, and the node values are the ordinals of those 
vectors.
-  // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
-  private final List<NeighborArray> graphLevel0;
-  // Represents levels 1-N. Each level is represented with a Map that maps a 
levels level 0
-  // ordinal to its neighbors on that level. All nodes are in level 0, so we 
do not need to maintain
-  // it in this list. However, to avoid changing list indexing, we always will 
make the first
-  // element
-  // null.
-  private final List<Map<Integer, NeighborArray>> graphUpperLevels;
-  private final int nsize;
-  private final int nsize0;
+  // the internal graph representation where the first dimension is node id 
and second dimension is
+  // level
+  // e.g. graph[1][2] is all the neighbours of node 1 at level 2
+  private NeighborArray[][] graph;
+  // essentially another 2d map which the first dimension is level and second 
dimension is node id,
+  // this is only
+  // generated on demand when there's someone calling getNodeOnLevel on a 
non-zero level
+  private List<Integer>[] levelToNodes;
+  private int
+      lastFreezeSize; // remember the size we are at last time to freeze the 
graph and generate
+  // levelToNodes
+  private int size; // graph size, which is number of nodes in level 0
+  private int
+      nonZeroLevelSize; // total number of NeighborArrays created that is not 
on level 0, for now it
+  // is only used to account memory usage
+  private final int nsize; // neighbour array size at non-zero level
+  private final int nsize0; // neighbour array size at zero level
+  private final boolean
+      noGrowth; // if an initial size is passed in, we don't expect the graph 
to grow itself
 
   // KnnGraphValues iterator members
   private int upto;
   private NeighborArray cur;
 
-  OnHeapHnswGraph(int M) {
+  /**
+   * ctor
+   *
+   * @param numNodes number of nodes that will be added to this graph, passing 
in -1 means unbounded
+   *     while passing in a non-negative value will lock the whole graph and 
disable the graph from
+   *     growth itself (you cannot add a node with has id >= numNodes)

Review Comment:
   syntax nits, should read: "growing itself (you cannot add a node having id 
>= numNodes)" 



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -158,50 +185,82 @@ public int entryNode() {
     return entryNode;
   }
 
+  /**
+   * WARN: calling this method will essentially iterate through all nodes at 
level 0 (even if you're
+   * not getting node at level 0), we have built some caching mechanism such 
that if graph is not
+   * changed only the first non-zero level call will pay the cost. So it is 
highly NOT recommended
+   * to call this method while the graph is still building.
+   *
+   * <p>NOTE: if the node is not inserted in order (e.g. we're init'd from 
another graph) such that
+   * there are some node in middle are not inserted. (e.g. the largest node 
inserted is 10 but node
+   * 5 is not yet inserted) Calling getNodesOnLevel(0) will lead to a wrong 
behavior because it is a
+   * simple iterating over range(size) TODO: maybe we should fix the above 
behavior?

Review Comment:
   Could we do 
   ```
   if (size != capacity) throw new IllegalStateException("graph build not 
complete, size=" + size + " capacity=" + capacity);
   ```



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph 
implements Accountable {
    * @param node the node whose neighbors are returned, represented as an 
ordinal on the level 0.
    */
   public NeighborArray getNeighbors(int level, int node) {
-    if (level == 0) {
-      return graphLevel0.get(node);
-    }
-    Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level);
-    assert levelMap.containsKey(node);
-    return levelMap.get(node);
+    assert graph.length > node && graph[node] != null && graph[node][level] != 
null;

Review Comment:
   I think we might actually prefer not to assert the first two conditions so 
that we can distinguish NPE from AIOOBE from AssertionError?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph 
implements Accountable {
    * @param node the node whose neighbors are returned, represented as an 
ordinal on the level 0.
    */
   public NeighborArray getNeighbors(int level, int node) {
-    if (level == 0) {
-      return graphLevel0.get(node);
-    }
-    Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level);
-    assert levelMap.containsKey(node);
-    return levelMap.get(node);
+    assert graph.length > node && graph[node] != null && graph[node][level] != 
null;
+    return graph[node][level];
   }
 
   @Override
   public int size() {
-    return graphLevel0.size(); // all nodes are located on the 0th level
+    return size;
+  }
+
+  /**
+   * When we initialize from another graph, the capacity, which is length of 
internal buffer of
+   * graph, is different from {@link #size()}, which is number of nodes 
already inserted, such that
+   * we need two method to retrieve each
+   *
+   * @return length of internal representation of graph
+   */
+  public int capacity() {
+    return graph.length;
   }
 
   /**
    * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes
    * preceded by the node inserted out of order are eventually added.
    *
+   * <p>NOTE: You must add a node from the node's top level

Review Comment:
   maybe expand to "You must add a node **starting** from the node's top level"?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph 
implements Accountable {
   // to vectors
   // added to HnswBuilder, and the node values are the ordinals of those 
vectors.
   // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
-  private final List<NeighborArray> graphLevel0;

Review Comment:
   ok, thanks I didn't see that



##########
lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java:
##########
@@ -454,77 +454,6 @@ public void testSearchWithSelectiveAcceptOrds() throws 
IOException {
     }
   }
 
-  public void testBuildOnHeapHnswGraphOutOfOrder() throws IOException {

Review Comment:
   ok this test is no longer interesting, but could we add a few new tests 
checking the error states:
   
   1. attempt to add nodes with levels out of order
   2. attempt to add nodes out of bounds to pre-allocated graph
   3. attempt to retrieve nodes iterator on incomplete graph
   
   Also maybe we still a need test where we add nodes out of order? I'm not 
sure how well that is covered



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to