Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

via GitHub Tue, 06 Aug 2024 05:04:48 -0700


benwtrent commented on code in PR #13635:
URL: https://github.com/apache/lucene/pull/13635#discussion_r1705409803



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -70,6 +72,43 @@ public static void search(
     search(scorer, knnCollector, graph, graphSearcher, acceptOrds);
   }
 
+  /**
+   * Searches the HNSW graph for for the nerest neighbors of a query vector, 
starting from the
+   * provided entry points.
+   *
+   * @param scorer the scorer to compare the query with the nodes
+   * @param knnCollector a collector of top knn results to be returned
+   * @param graph the graph values. May represent the entire graph, or a level 
in a hierarchical
+   *     graph.
+   * @param acceptOrds {@link Bits} that represents the allowed document 
ordinals to match, or
+   *     {@code null} if they are all allowed to match.
+   * @param entryPointOrds the entry points for search.
+   */
+  public static void search(
+      RandomVectorScorer scorer,
+      KnnCollector knnCollector,
+      HnswGraph graph,
+      Bits acceptOrds,
+      DocIdSetIterator entryPointOrds)

Review Comment:
   These technically are not vector ordinals but doc IDs. If not all docs have 
a vector field, these will not be a `1-1` mapping and will need to be 
translated to vector ordinals.
   
   In the search path, you can actually see the reverse happening (translating 
vector ordinals to doc ids): 
   
   ```
       final RandomVectorScorer scorer = scorerSupplier.get();
       final KnnCollector collector =
           new OrdinalTranslatedKnnCollector(knnCollector, scorer::ordToDoc);
   ```
   
   So, somewhere up stream, if the seeds are provided, there needs to be a 
translation into vector ords. I don't know immediately how to do this, I would 
have to read around the code to see if this is easily possible.



##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -156,6 +189,44 @@ private TopDocs getLeafResults(
     }
   }
 
+  private DocIdSetIterator executeSeedQuery(LeafReaderContext ctx, Weight 
seedWeight)

Review Comment:
   Interesting indeed. 
   
   I would suppose, that depending on the query, this might actually be slower 
than doing kNN search. Though, I guess you could have the benefit of requiring 
less candidates (smaller efSearch). 
   
   How should this weight behave with pre-filtering? Shouldn't we apply the 
filter with the seed query to ensure the starting entry points are also valid 
candidates?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

Reply via email to