msokolov commented on code in PR #12651: URL: https://github.com/apache/lucene/pull/12651#discussion_r1359884135
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException { return graph.nextNeighbor(); } + private static int getGraphSize(HnswGraph graph) { + if (graph instanceof OnHeapHnswGraph) { Review Comment: I see - can be a follow-up, but perhaps we introduce `capacity()` to `HnswGraph` so we can use it uniformly without casting ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -99,27 +122,31 @@ public void addNode(int level, int node) { entryNode = node; } - if (level > 0) { - // if the new node introduces a new level, add more levels to the graph, - // and make this node the graph's new entry point - if (level >= numLevels) { - for (int i = numLevels; i <= level; i++) { - graphUpperLevels.add(new HashMap<>()); - } - numLevels = level + 1; - entryNode = node; + if (node >= graph.length) { + if (noGrowth) { + throw new AssertionError( Review Comment: This should either be `assert node < graph.length || noGrowth == false: "...message..."` or else we should throw an `IllegalArgumentException` - I don't think we ought to be throwing `AssertionError` in production code. ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -99,27 +122,31 @@ public void addNode(int level, int node) { entryNode = node; } - if (level > 0) { - // if the new node introduces a new level, add more levels to the graph, - // and make this node the graph's new entry point - if (level >= numLevels) { - for (int i = numLevels; i <= level; i++) { - graphUpperLevels.add(new HashMap<>()); - } - numLevels = level + 1; - entryNode = node; + if (node >= graph.length) { + if (noGrowth) { + throw new AssertionError( + "The graph does not expect to growth when an initial size is given"); Review Comment: syntax: growth -> grow ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -32,39 +31,54 @@ */ public final class OnHeapHnswGraph extends HnswGraph implements Accountable { + private static final int INIT_SIZE = 128; + private int numLevels; // the current number of levels in the graph private int entryNode; // the current graph entry node on the top level. -1 if not set - // Level 0 is represented as List<NeighborArray> – nodes' connections on level 0. - // Each entry in the list has the top maxConn/maxConn0 neighbors of a node. The nodes correspond - // to vectors - // added to HnswBuilder, and the node values are the ordinals of those vectors. - // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals. - private final List<NeighborArray> graphLevel0; - // Represents levels 1-N. Each level is represented with a Map that maps a levels level 0 - // ordinal to its neighbors on that level. All nodes are in level 0, so we do not need to maintain - // it in this list. However, to avoid changing list indexing, we always will make the first - // element - // null. - private final List<Map<Integer, NeighborArray>> graphUpperLevels; - private final int nsize; - private final int nsize0; + // the internal graph representation where the first dimension is node id and second dimension is + // level + // e.g. graph[1][2] is all the neighbours of node 1 at level 2 + private NeighborArray[][] graph; + // essentially another 2d map which the first dimension is level and second dimension is node id, + // this is only + // generated on demand when there's someone calling getNodeOnLevel on a non-zero level + private List<Integer>[] levelToNodes; + private int + lastFreezeSize; // remember the size we are at last time to freeze the graph and generate + // levelToNodes + private int size; // graph size, which is number of nodes in level 0 + private int + nonZeroLevelSize; // total number of NeighborArrays created that is not on level 0, for now it + // is only used to account memory usage + private final int nsize; // neighbour array size at non-zero level + private final int nsize0; // neighbour array size at zero level + private final boolean + noGrowth; // if an initial size is passed in, we don't expect the graph to grow itself // KnnGraphValues iterator members private int upto; private NeighborArray cur; - OnHeapHnswGraph(int M) { + /** + * ctor + * + * @param numNodes number of nodes that will be added to this graph, passing in -1 means unbounded + * while passing in a non-negative value will lock the whole graph and disable the graph from + * growth itself (you cannot add a node with has id >= numNodes) Review Comment: syntax nits, should read: "growing itself (you cannot add a node having id >= numNodes)" ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -158,50 +185,82 @@ public int entryNode() { return entryNode; } + /** + * WARN: calling this method will essentially iterate through all nodes at level 0 (even if you're + * not getting node at level 0), we have built some caching mechanism such that if graph is not + * changed only the first non-zero level call will pay the cost. So it is highly NOT recommended + * to call this method while the graph is still building. + * + * <p>NOTE: if the node is not inserted in order (e.g. we're init'd from another graph) such that + * there are some node in middle are not inserted. (e.g. the largest node inserted is 10 but node + * 5 is not yet inserted) Calling getNodesOnLevel(0) will lead to a wrong behavior because it is a + * simple iterating over range(size) TODO: maybe we should fix the above behavior? Review Comment: Could we do ``` if (size != capacity) throw new IllegalStateException("graph build not complete, size=" + size + " capacity=" + capacity); ``` ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph implements Accountable { * @param node the node whose neighbors are returned, represented as an ordinal on the level 0. */ public NeighborArray getNeighbors(int level, int node) { - if (level == 0) { - return graphLevel0.get(node); - } - Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level); - assert levelMap.containsKey(node); - return levelMap.get(node); + assert graph.length > node && graph[node] != null && graph[node][level] != null; Review Comment: I think we might actually prefer not to assert the first two conditions so that we can distinguish NPE from AIOOBE from AssertionError? ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph implements Accountable { * @param node the node whose neighbors are returned, represented as an ordinal on the level 0. */ public NeighborArray getNeighbors(int level, int node) { - if (level == 0) { - return graphLevel0.get(node); - } - Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level); - assert levelMap.containsKey(node); - return levelMap.get(node); + assert graph.length > node && graph[node] != null && graph[node][level] != null; + return graph[node][level]; } @Override public int size() { - return graphLevel0.size(); // all nodes are located on the 0th level + return size; + } + + /** + * When we initialize from another graph, the capacity, which is length of internal buffer of + * graph, is different from {@link #size()}, which is number of nodes already inserted, such that + * we need two method to retrieve each + * + * @return length of internal representation of graph + */ + public int capacity() { + return graph.length; } /** * Add node on the given level. Nodes can be inserted out of order, but it requires that the nodes * preceded by the node inserted out of order are eventually added. * + * <p>NOTE: You must add a node from the node's top level Review Comment: maybe expand to "You must add a node **starting** from the node's top level"? ########## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ########## @@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph implements Accountable { // to vectors // added to HnswBuilder, and the node values are the ordinals of those vectors. // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals. - private final List<NeighborArray> graphLevel0; Review Comment: ok, thanks I didn't see that ########## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ########## @@ -454,77 +454,6 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException { } } - public void testBuildOnHeapHnswGraphOutOfOrder() throws IOException { Review Comment: ok this test is no longer interesting, but could we add a few new tests checking the error states: 1. attempt to add nodes with levels out of order 2. attempt to add nodes out of bounds to pre-allocated graph 3. attempt to retrieve nodes iterator on incomplete graph Also maybe we still a need test where we add nodes out of order? I'm not sure how well that is covered -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org