jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1061991840


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -143,10 +148,64 @@ public OnHeapHnswGraph build(RandomAccessVectorValues 
vectorsToAdd) throws IOExc
     return hnsw;
   }
 
+  /**
+   * Initializes the graph of this builder. Transfers the nodes and their 
neighbors from the
+   * initializer graph into the graph being produced by this builder, mapping 
ordinals from the
+   * initializer graph to their new ordinals in this builder's graph. The 
builder's graph must be
+   * empty before calling this method.
+   *
+   * @param initializerGraph graph used for initialization
+   * @param oldToNewOrdinalMap map for converting from ordinals in the 
initializerGraph to this
+   *     builder's graph
+   */
+  public void initializeFromGraph(
+      HnswGraph initializerGraph, Map<Integer, Integer> oldToNewOrdinalMap) 
throws IOException {
+    assert hnsw.size() == 0;
+    float[] vectorValue = null;
+    BytesRef binaryValue = null;
+    for (int level = 0; level < initializerGraph.numLevels(); level++) {
+      HnswGraph.NodesIterator it = initializerGraph.getNodesOnLevel(level);
+
+      while (it.hasNext()) {
+        int oldOrd = it.nextInt();
+        int newOrd = oldToNewOrdinalMap.get(oldOrd);
+
+        hnsw.addNode(level, newOrd);
+
+        if (level == 0) {
+          initializedNodes.add(newOrd);
+        }
+
+        switch (this.vectorEncoding) {
+          case FLOAT32 -> vectorValue = vectors.vectorValue(newOrd);
+          case BYTE -> binaryValue = vectors.binaryValue(newOrd);
+        }
+
+        NeighborArray newNeighbors = this.hnsw.getNeighbors(level, newOrd);
+        initializerGraph.seek(level, oldOrd);
+        for (int oldNeighbor = initializerGraph.nextNeighbor();
+            oldNeighbor != NO_MORE_DOCS;
+            oldNeighbor = initializerGraph.nextNeighbor()) {
+          int newNeighbor = oldToNewOrdinalMap.get(oldNeighbor);
+          float score =
+              switch (this.vectorEncoding) {
+                case FLOAT32 -> this.similarityFunction.compare(

Review Comment:
   > Is this sorted order only used for calculating diversity easier?
   
   Yes, I think you are correct. The  reason for sorting order by distance 
during construction is that the neighbor arrays of an inserted node continue to 
get updated as more nodes are inserted in. So keeping it sorted will allow the 
worst node or nodes will allow it to be more easily identified. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to