benwtrent commented on code in PR #12413: URL: https://github.com/apache/lucene/pull/12413#discussion_r1253619111
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -256,6 +256,72 @@ public NeighborQueue searchLevel( return results; } + /** + * Function to find the best entry point from which to search the zeroth graph layer. + * + * @param query vector query with which to search + * @param vectors random access vector values + * @param graph the HNSWGraph + * @param visitLimit How many vectors are allowed to be visited + * @return An integer array whose first element is the best entry point, and second is the number + * of candidates visited. Entry point of `-1` indicates visitation limit exceed + * @throws IOException When accessing the vector fails + */ + private int[] findBestEntryPoint( + T query, RandomAccessVectorValues<T> vectors, HnswGraph graph, int visitLimit) + throws IOException { + int size = graph.size(); + int visitedCount = 1; + prepareScratchState(vectors.size()); + final NeighborQueue results = new NeighborQueue(1, false); + int currentEp = graph.entryNode(); + float currentScore = compare(query, vectors, currentEp); + float minAcceptedSimilarity = currentScore; + results.add(currentEp, currentScore); + for (int level = graph.numLevels() - 1; level >= 1; level--) { + candidates.add(currentEp, currentScore); + visited.set(currentEp); + // Keep searching the given level until we stop finding a better candidate entry point + while (candidates.size() > 0) { + // get the best candidate (closest or best scoring) + float topCandidateSimilarity = candidates.topScore(); + if (topCandidateSimilarity < minAcceptedSimilarity) { + break; + } + + int topCandidateNode = candidates.pop(); + graphSeek(graph, level, topCandidateNode); + int friendOrd; + while ((friendOrd = graphNextNeighbor(graph)) != NO_MORE_DOCS) { + assert friendOrd < size : "friendOrd=" + friendOrd + "; size=" + size; + if (visited.getAndSet(friendOrd)) { + continue; + } + if (visitedCount >= visitLimit) { + return new int[] {-1, visitedCount}; + } + float friendSimilarity = compare(query, vectors, friendOrd); + visitedCount++; + if (friendSimilarity >= minAcceptedSimilarity) { + candidates.add(friendOrd, friendSimilarity); + if (results.insertWithOverflow(friendOrd, friendSimilarity) && results.size() >= 1) { Review Comment: I agree @msokolov, but there is a weird edge case I am not 100% sure of. When I revert back to my commit here: https://github.com/apache/lucene/pull/12413/commits/22da1e4f88a8124f390f572ef5ea89202c70a347 My recall numbers change. I can dig more into why that commit's solution is buggy and go back to something similar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org