msokolov commented on code in PR #14181:
URL: https://github.com/apache/lucene/pull/14181#discussion_r1935932869


##########
lucene/core/src/java/org/apache/lucene/codecs/hnsw/DefaultFlatVectorScorer.java:
##########
@@ -90,23 +91,29 @@ public String toString() {
   private static final class ByteScoringSupplier implements 
RandomVectorScorerSupplier {
     private final ByteVectorValues vectors;
     private final ByteVectorValues vectors1;
-    private final ByteVectorValues vectors2;
     private final VectorSimilarityFunction similarityFunction;
 
     private ByteScoringSupplier(
         ByteVectorValues vectors, VectorSimilarityFunction similarityFunction) 
throws IOException {
       this.vectors = vectors;
       vectors1 = vectors.copy();

Review Comment:
   since it is asymmetric now, instead of 1 vs 2, maybe rename to 
`targetVectors` or so?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##########
@@ -190,12 +191,25 @@ private int getStartPos(int maxOrd) {
       }
     }
 
+    @Override
+    public void addGraphNode(int node, UpdateableRandomVectorScorer scorer) 
throws IOException {
+      if (initializedNodes != null && initializedNodes.get(node)) {
+        return;
+      }
+      super.addGraphNode(node, scorer);
+    }
+
     @Override
     public void addGraphNode(int node) throws IOException {

Review Comment:
   curious if we still need this, and generally, if we are still using 
non-updateable RandomVectorScorer?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##########
@@ -190,12 +191,25 @@ private int getStartPos(int maxOrd) {
       }
     }
 
+    @Override
+    public void addGraphNode(int node, UpdateableRandomVectorScorer scorer) 
throws IOException {
+      if (initializedNodes != null && initializedNodes.get(node)) {
+        return;
+      }
+      super.addGraphNode(node, scorer);
+    }
+
     @Override
     public void addGraphNode(int node) throws IOException {
       if (initializedNodes != null && initializedNodes.get(node)) {
         return;
       }
-      super.addGraphNode(node);
+      if (scorer == null) {
+        scorer = scorerSupplier.scorer(node);

Review Comment:
   would it make sense to add this (maybe with node==0?) when creating the 
builder so it's never null



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -191,8 +191,14 @@ protected void addVectors(int minOrd, int maxOrd) throws 
IOException {
     if (infoStream.isEnabled(HNSW_COMPONENT)) {
       infoStream.message(HNSW_COMPONENT, "addVectors [" + minOrd + " " + 
maxOrd + ")");
     }
+    UpdateableRandomVectorScorer scorer = null;
     for (int node = minOrd; node < maxOrd; node++) {
-      addGraphNode(node);
+      if (scorer == null) {
+        scorer = scorerSupplier.scorer(node);

Review Comment:
   ditto here - these lazy inits are okay, but proactively initializing is 
better



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -296,6 +282,30 @@ to the newly introduced levels (repeating step 2,3 for new 
levels) and again try
     } while (true);
   }
 
+  @Override
+  public void addGraphNode(int node) throws IOException {
+    /*
+    Note: this implementation is thread safe when graph size is fixed (e.g. 
when merging)
+    The process of adding a node is roughly:
+    1. Add the node to all level from top to the bottom, but do not connect it 
to any other node,
+       nor try to promote itself to an entry node before the connection is 
done. (Unless the graph is empty
+       and this is the first node, in that case we set the entry node and 
return)
+    2. Do the search from top to bottom, remember all the possible neighbours 
on each level the node
+       is on.
+    3. Add the neighbor to the node from bottom to top level, when adding the 
neighbour,
+       we always add all the outgoing links first before adding incoming link 
such that
+       when a search visits this node, it can always find a way out
+    4. If the node has level that is less or equal to graph level, then we're 
done here.
+       If the node has level larger than graph level, then we need to promote 
the node
+       as the entry node. If, while we add the node to the graph, the entry 
node has changed
+       (which means the graph level has changed as well), we need to reinsert 
the node
+       to the newly introduced levels (repeating step 2,3 for new levels) and 
again try to
+       promote the node to entry node.
+    */
+    UpdateableRandomVectorScorer scorer = scorerSupplier.scorer(node);

Review Comment:
   how did we get away with not doing this before? Is it that we pushed the 
supplier.scorer() call deeper in the call graph?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -463,7 +478,11 @@ private boolean connectComponents(int level) throws 
IOException {
 
           beam.clear();
           eps[0] = c0.start();
-          RandomVectorScorer scorer = scorerSupplier.scorer(c.start());
+          if (scorer == null) {

Review Comment:
   oh yes, here it is -- again could we initialize outside the loop, maybe even 
initialize to all zeros? Can ScorerSupplier accept null??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to