msokolov commented on a change in pull request #267:
URL: https://github.com/apache/lucene/pull/267#discussion_r700609993



##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -188,15 +244,28 @@ public int size() {
   }
 
   // TODO: optimize RAM usage so not to store references for all nodes for 
levels > 0
+  // TODO: add extra levels if level >= numLevels
   public void addNode(int level, int node) {
     if (level > 0) {
+      // if the new node introduces a new level, make this node the graph's 
new entry point
+      if (level > curMaxLevel) {
+        curMaxLevel = level;
+        entryNode = node;
+        // add more levels if needed
+        if (level >= graph.size()) {
+          for (int i = graph.size(); i <= level; i++) {
+            graph.add(new ArrayList<>());
+          }
+        }
+      }
       // Levels above 0th don't contain all nodes,

Review comment:
       I was thinking of keeping what you have now, but `graph.get(level)` 
would be the size of that level, with no nulls, and the numbers in its neighbor 
arrays would be indices in that same level. Then in parallel, each level would 
have `int level0Node[]` and `int nextLevelNode[]` of the same size; each entry 
in `level0Node` tells the node id in level 0 of the current node, and in 
`nextLevelNode` tells the node id in the next level "down". so I guess there 
would actually be a pair of `List<int[]>`, and `level0.get(0)` and 
`nextLevel.get(0)` would be both be `null` (since you don't need to map from 
level 0 to itself or to a nonexistent next level), and `nextLevel.get(1)` and 
`level0.get(1)` would be the same array.

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -107,32 +113,82 @@ public static NeighborQueue search(
       Random random)
       throws IOException {
     int size = graphValues.size();
+    int boundedNumSeed = Math.min(numSeed, 2 * size);
+    NeighborQueue results;
+
+    if (graphValues.maxLevel() == 0) {
+      // search in SNW; generate a number of entry points randomly
+      final int[] eps = new int[boundedNumSeed];
+      for (int i = 0; i < boundedNumSeed; i++) {
+        eps[i] = random.nextInt(size);

Review comment:
       ah, fair enough

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -107,32 +113,82 @@ public static NeighborQueue search(
       Random random)
       throws IOException {
     int size = graphValues.size();
+    int boundedNumSeed = Math.min(numSeed, 2 * size);
+    NeighborQueue results;
+
+    if (graphValues.maxLevel() == 0) {
+      // search in SNW; generate a number of entry points randomly
+      final int[] eps = new int[boundedNumSeed];
+      for (int i = 0; i < boundedNumSeed; i++) {
+        eps[i] = random.nextInt(size);
+      }
+      return searchLevel(query, topK, 0, eps, vectors, similarityFunction, 
graphValues, acceptOrds);
+    } else {
+      // search in hierarchical SNW
+      int[] eps = new int[] {graphValues.entryNode()};
+      for (int level = graphValues.maxLevel(); level >= 1; level--) {
+        results =
+            HnswGraph.searchLevel(
+                query, 1, level, eps, vectors, similarityFunction, 
graphValues, acceptOrds);
+        eps = new int[] {results.pop()};
+      }
+      results =
+          HnswGraph.searchLevel(
+              query, boundedNumSeed, 0, eps, vectors, similarityFunction, 
graphValues, acceptOrds);
+      while (results.size() > topK) {
+        results.pop();
+      }
+      return results;
+    }
+  }
 
+  /**
+   * Searches for the nearest neighbors of a query vector in a given level
+   *
+   * @param query search query vector
+   * @param topK the number of nearest to query results to return

Review comment:
       oh, I see - for flat graph we have eps.length > 1, but it is always = 1 
in the HNSW case? I think it's fine




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to