jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1311935419
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -172,106 +83,36 @@ public static KnnCollector search( * @return a set of collected vectors holding the nearest neighbors found */ public static KnnCollector search( - byte[] query, - int topK, - RandomAccessVectorValues<byte[]> vectors, - VectorEncoding vectorEncoding, - VectorSimilarityFunction similarityFunction, - HnswGraph graph, - Bits acceptOrds, - int visitedLimit) - throws IOException { - KnnCollector collector = new TopKnnCollector(topK, visitedLimit); - search(query, collector, vectors, vectorEncoding, similarityFunction, graph, acceptOrds); - return collector; - } - - /** - * Searches HNSW graph for the nearest neighbors of a query vector. - * - * @param query search query vector - * @param knnCollector a collector of top knn results to be returned - * @param vectors the vector values - * @param similarityFunction the similarity function to compare vectors - * @param graph the graph values. May represent the entire graph, or a level in a hierarchical - * graph. - * @param acceptOrds {@link Bits} that represents the allowed document ordinals to match, or - * {@code null} if they are all allowed to match. - */ - public static void search( - byte[] query, - KnnCollector knnCollector, - RandomAccessVectorValues<byte[]> vectors, - VectorEncoding vectorEncoding, - VectorSimilarityFunction similarityFunction, - HnswGraph graph, - Bits acceptOrds) - throws IOException { - if (query.length != vectors.dimension()) { - throw new IllegalArgumentException( - "vector query dimension: " - + query.length - + " differs from field dimension: " - + vectors.dimension()); - } - HnswGraphSearcher<byte[]> graphSearcher = - new HnswGraphSearcher<>( - vectorEncoding, - similarityFunction, - new NeighborQueue(knnCollector.k(), true), - new SparseFixedBitSet(vectors.size())); - search(query, knnCollector, vectors, graph, graphSearcher, acceptOrds); - } - - /** - * Search {@link OnHeapHnswGraph}, this method is thread safe, for parameters please refer to - * {@link #search(byte[], int, RandomAccessVectorValues, VectorEncoding, VectorSimilarityFunction, - * HnswGraph, Bits, int)} - */ - public static KnnCollector search( - byte[] query, - int topK, - RandomAccessVectorValues<byte[]> vectors, - VectorEncoding vectorEncoding, - VectorSimilarityFunction similarityFunction, - OnHeapHnswGraph graph, - Bits acceptOrds, - int visitedLimit) + RandomVectorScorer scorer, int topK, OnHeapHnswGraph graph, Bits acceptOrds, int visitedLimit) Review Comment: This function is only used by the `Word2VecSynonymProvider` to search the on-heap builder/searcher with multiple threads. I think we can simplify things a bit to not require to have an `OnHeapHnswGraph` for this purpose. It's outside the scope of this PR but the main reason this function exists is because the `HnswGraph` API exposes `seek` and `nextNeighbor` as separate function so the implementer needs to keep a state in the object. We could get rid of this by changing the API in a follow up and only expose a `PrimitiveIterator.OfInt getNeighbors()`. In the latest codec we also eagerly load the entire list on each `seek` call (for the off heap version) so we don't really need the iterator API either. Maybe exposing it as a simple `int[] getNeighbors()` could be enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org