benwtrent commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1705409803
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -70,6 +72,43 @@ public static void search( search(scorer, knnCollector, graph, graphSearcher, acceptOrds); } + /** + * Searches the HNSW graph for for the nerest neighbors of a query vector, starting from the + * provided entry points. + * + * @param scorer the scorer to compare the query with the nodes + * @param knnCollector a collector of top knn results to be returned + * @param graph the graph values. May represent the entire graph, or a level in a hierarchical + * graph. + * @param acceptOrds {@link Bits} that represents the allowed document ordinals to match, or + * {@code null} if they are all allowed to match. + * @param entryPointOrds the entry points for search. + */ + public static void search( + RandomVectorScorer scorer, + KnnCollector knnCollector, + HnswGraph graph, + Bits acceptOrds, + DocIdSetIterator entryPointOrds) Review Comment: These technically are not vector ordinals but doc IDs. If not all docs have a vector field, these will not be a `1-1` mapping and will need to be translated to vector ordinals. In the search path, you can actually see the reverse happening (translating vector ordinals to doc ids): ``` final RandomVectorScorer scorer = scorerSupplier.get(); final KnnCollector collector = new OrdinalTranslatedKnnCollector(knnCollector, scorer::ordToDoc); ``` So, somewhere up stream, if the seeds are provided, there needs to be a translation into vector ords. I don't know immediately how to do this, I would have to read around the code to see if this is easily possible. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -156,6 +189,44 @@ private TopDocs getLeafResults( } } + private DocIdSetIterator executeSeedQuery(LeafReaderContext ctx, Weight seedWeight) Review Comment: Interesting indeed. I would suppose, that depending on the query, this might actually be slower than doing kNN search. Though, I guess you could have the benefit of requiring less candidates (smaller efSearch). How should this weight behave with pre-filtering? Shouldn't we apply the filter with the seed query to ensure the starting entry points are also valid candidates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org