mayya-sharipova commented on a change in pull request #267: URL: https://github.com/apache/lucene/pull/267#discussion_r700580884
########## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java ########## @@ -107,32 +113,82 @@ public static NeighborQueue search( Random random) throws IOException { int size = graphValues.size(); + int boundedNumSeed = Math.min(numSeed, 2 * size); + NeighborQueue results; + + if (graphValues.maxLevel() == 0) { + // search in SNW; generate a number of entry points randomly + final int[] eps = new int[boundedNumSeed]; + for (int i = 0; i < boundedNumSeed; i++) { + eps[i] = random.nextInt(size); + } + return searchLevel(query, topK, 0, eps, vectors, similarityFunction, graphValues, acceptOrds); + } else { + // search in hierarchical SNW + int[] eps = new int[] {graphValues.entryNode()}; + for (int level = graphValues.maxLevel(); level >= 1; level--) { + results = + HnswGraph.searchLevel( + query, 1, level, eps, vectors, similarityFunction, graphValues, acceptOrds); + eps = new int[] {results.pop()}; + } + results = + HnswGraph.searchLevel( + query, boundedNumSeed, 0, eps, vectors, similarityFunction, graphValues, acceptOrds); + while (results.size() > topK) { + results.pop(); + } + return results; + } + } + /** + * Searches for the nearest neighbors of a query vector in a given level + * + * @param query search query vector + * @param topK the number of nearest to query results to return Review comment: Indeed, it is usually the case except these conditions: 1. When doing a hierarchical search. As we reach level 0, we use a single `ep` found from the level 1 (`eps.length=1`), while topK is > 1. 2. When building a hierarchical graph. When adding a node, as we reach `nodeLevel` - a starting level for this node, we use a single `ep` found from the `level = nodeLevel + 1`, while using topk = efconstruction. I am happy to discuss alternative API where we may not need topK even for these cases, for exampling expanding `eps` array to have at least `topK` elements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org