benwtrent commented on code in PR #14094:
URL: https://github.com/apache/lucene/pull/14094#discussion_r2007365351


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OrdinalTranslatedKnnCollector.java:
##########
@@ -50,4 +51,11 @@ public TopDocs topDocs() {
                 : TotalHits.Relation.EQUAL_TO),
         td.scoreDocs);
   }
+
+  @Override
+  public void nextCandidate() {
+    if (this.collector instanceof HnswKnnCollector) {
+      ((HnswKnnCollector) this.collector).nextCandidate();
+    }

Review Comment:
   ```suggestion
       if (this.collector instanceof HnswKnnCollector hnswCollector) {
         hnswCollector.nextCandidate();
       }
   ```



##########
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java:
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+/**
+ * A {@link HnswKnnCollector} that early exits when nearest neighbor queue 
keeps saturating beyond a
+ * 'patience' parameter. This records the rate of collection of new nearest 
neighbors in the {@code
+ * delegate} KnnCollector queue, at each HNSW node candidate visit. Once it 
saturates for a number
+ * of consecutive node visits (e.g., the patience parameter), this early 
terminates.
+ *
+ * @lucene.experimental
+ */
+public class HnswQueueSaturationCollector extends HnswKnnCollector {
+
+  private final KnnCollector delegate;
+  private final double saturationThreshold;
+  private final int patience;
+  private boolean patienceFinished;
+  private int countSaturated;
+  private int previousQueueSize;
+  private int currentQueueSize;
+
+  HnswQueueSaturationCollector(KnnCollector delegate, double 
saturationThreshold, int patience) {
+    super(delegate);
+    this.delegate = delegate;
+    this.previousQueueSize = 0;
+    this.currentQueueSize = 0;
+    this.countSaturated = 0;
+    this.patienceFinished = false;
+    this.saturationThreshold = saturationThreshold;
+    this.patience = patience;
+  }
+
+  @Override
+  public boolean earlyTerminated() {
+    return delegate.earlyTerminated() || patienceFinished;
+  }
+
+  @Override
+  public boolean collect(int docId, float similarity) {
+    boolean collect = delegate.collect(docId, similarity);
+    if (collect) {
+      currentQueueSize++;
+    }
+    return collect;
+  }
+
+  @Override
+  public float minCompetitiveSimilarity() {
+    return delegate.minCompetitiveSimilarity();
+  }

Review Comment:
   since we are a decorator, do we need this?



##########
lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+/**
+ * {@link KnnCollector} that exposes methods to hook into specific parts of 
the HNSW algorithm.
+ *
+ * @lucene.experimental
+ */
+public abstract class HnswKnnCollector extends KnnCollector.Decorator {

Review Comment:
   Ah, it is a little frustrating as we already have an "HNSWStrategy" and now 
we have an "HNSWCollector". 
   
   Could we utilize an HNSWStrategy? Or make `nextCandidate` a more general API?
   
   My thought on the strategy would be that the graph searcher to indicate 
through the strategy object when the next group of vectors will be searched and 
the strategy would have a reference to the collector to which it can forward 
the request. 
   
   Of course, this still requires a new `HnswQueueSaturationCollector`, but it 
won't require these new base classes.



##########
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java:
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+/**
+ * A {@link HnswKnnCollector} that early exits when nearest neighbor queue 
keeps saturating beyond a
+ * 'patience' parameter. This records the rate of collection of new nearest 
neighbors in the {@code
+ * delegate} KnnCollector queue, at each HNSW node candidate visit. Once it 
saturates for a number
+ * of consecutive node visits (e.g., the patience parameter), this early 
terminates.
+ *
+ * @lucene.experimental
+ */
+public class HnswQueueSaturationCollector extends HnswKnnCollector {
+
+  private final KnnCollector delegate;
+  private final double saturationThreshold;
+  private final int patience;
+  private boolean patienceFinished;
+  private int countSaturated;
+  private int previousQueueSize;
+  private int currentQueueSize;
+
+  HnswQueueSaturationCollector(KnnCollector delegate, double 
saturationThreshold, int patience) {
+    super(delegate);
+    this.delegate = delegate;
+    this.previousQueueSize = 0;
+    this.currentQueueSize = 0;
+    this.countSaturated = 0;
+    this.patienceFinished = false;
+    this.saturationThreshold = saturationThreshold;
+    this.patience = patience;
+  }
+
+  @Override
+  public boolean earlyTerminated() {
+    return delegate.earlyTerminated() || patienceFinished;
+  }
+
+  @Override
+  public boolean collect(int docId, float similarity) {
+    boolean collect = delegate.collect(docId, similarity);
+    if (collect) {
+      currentQueueSize++;
+    }
+    return collect;
+  }
+
+  @Override
+  public float minCompetitiveSimilarity() {
+    return delegate.minCompetitiveSimilarity();
+  }
+
+  @Override
+  public TopDocs topDocs() {
+    TopDocs topDocs;
+    if (patienceFinished && delegate.earlyTerminated() == false) {
+      TopDocs delegateDocs = delegate.topDocs();
+      TotalHits totalHits =
+          new TotalHits(delegateDocs.totalHits.value(), 
TotalHits.Relation.EQUAL_TO);
+      topDocs = new TopDocs(totalHits, delegateDocs.scoreDocs);
+    } else {
+      topDocs = delegate.topDocs();
+    }
+    return topDocs;
+  }
+
+  @Override
+  public void nextCandidate() {

Review Comment:
   @tteofili what do you think of making this more general? I think having a 
"nextCandidate" or "nextBlockOfVectors" is generally useful, and might be 
applicable to all types of kNN indices.
   
   For example:
   
    - Flat, you just get called once, indicating you are searching ALL vectors
    - HNSW, you get called for each NSW (or in the case of filtered search, 
extended NSW)
    - IVF, you get called for each posting list
    - Vamana, you get called for each node before calling the neighbors
   
   Do you think we can make this API general?
   
   Maybe not, I am not sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to