benwtrent commented on code in PR #12413:
URL: https://github.com/apache/lucene/pull/12413#discussion_r1253619111


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -256,6 +256,72 @@ public NeighborQueue searchLevel(
     return results;
   }
 
+  /**
+   * Function to find the best entry point from which to search the zeroth 
graph layer.
+   *
+   * @param query vector query with which to search
+   * @param vectors random access vector values
+   * @param graph the HNSWGraph
+   * @param visitLimit How many vectors are allowed to be visited
+   * @return An integer array whose first element is the best entry point, and 
second is the number
+   *     of candidates visited. Entry point of `-1` indicates visitation limit 
exceed
+   * @throws IOException When accessing the vector fails
+   */
+  private int[] findBestEntryPoint(
+      T query, RandomAccessVectorValues<T> vectors, HnswGraph graph, int 
visitLimit)
+      throws IOException {
+    int size = graph.size();
+    int visitedCount = 1;
+    prepareScratchState(vectors.size());
+    final NeighborQueue results = new NeighborQueue(1, false);
+    int currentEp = graph.entryNode();
+    float currentScore = compare(query, vectors, currentEp);
+    float minAcceptedSimilarity = currentScore;
+    results.add(currentEp, currentScore);
+    for (int level = graph.numLevels() - 1; level >= 1; level--) {
+      candidates.add(currentEp, currentScore);
+      visited.set(currentEp);
+      // Keep searching the given level until we stop finding a better 
candidate entry point
+      while (candidates.size() > 0) {
+        // get the best candidate (closest or best scoring)
+        float topCandidateSimilarity = candidates.topScore();
+        if (topCandidateSimilarity < minAcceptedSimilarity) {
+          break;
+        }
+
+        int topCandidateNode = candidates.pop();
+        graphSeek(graph, level, topCandidateNode);
+        int friendOrd;
+        while ((friendOrd = graphNextNeighbor(graph)) != NO_MORE_DOCS) {
+          assert friendOrd < size : "friendOrd=" + friendOrd + "; size=" + 
size;
+          if (visited.getAndSet(friendOrd)) {
+            continue;
+          }
+          if (visitedCount >= visitLimit) {
+            return new int[] {-1, visitedCount};
+          }
+          float friendSimilarity = compare(query, vectors, friendOrd);
+          visitedCount++;
+          if (friendSimilarity >= minAcceptedSimilarity) {
+            candidates.add(friendOrd, friendSimilarity);
+            if (results.insertWithOverflow(friendOrd, friendSimilarity) && 
results.size() >= 1) {

Review Comment:
   I agree @msokolov, but there is a weird edge case I am not 100% sure of. 
When I revert back to my commit here: 
https://github.com/apache/lucene/pull/12413/commits/22da1e4f88a8124f390f572ef5ea89202c70a347
   
   My recall numbers change. I can dig more into why that commit's solution is 
buggy and go back to something similar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to