[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553467#comment-17553467 ]
Kaival Parikh commented on LUCENE-10611: ---------------------------------------- As a fix, we can check if results are incomplete after [this line|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L89], and return results accordingly {code:java} diff --git a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java index b1a2436166f..7c641f077ee 100644 --- a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java +++ b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java @@ -87,6 +87,9 @@ public final class HnswGraphSearcher { int numVisited = 0; for (int level = graph.numLevels() - 1; level >= 1; level--) { results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, visitedLimit); + if (results.incomplete()) { + return results; + } eps[0] = results.pop(); numVisited += results.visitedCount();{code} Alternatively, we do not enforce limits in higher levels by setting the limit as Integer.MAX_VALUE [here|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L89] (also not updating the counts [here|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L92-L93]), but we might end up visiting more nodes than desired {code:java} diff --git a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java index b1a2436166f..0101cbd7690 100644 --- a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java +++ b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java @@ -86,11 +86,8 @@ public final class HnswGraphSearcher { int[] eps = new int[] {graph.entryNode()}; int numVisited = 0; for (int level = graph.numLevels() - 1; level >= 1; level--) { - results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, visitedLimit); + results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, Integer.MAX_VALUE); eps[0] = results.pop(); - - numVisited += results.visitedCount(); - visitedLimit -= results.visitedCount(); } results = graphSearcher.searchLevel(query, topK, 0, eps, vectors, graph, acceptOrds, visitedLimit); {code} > KnnVectorQuery throwing Heap Error for Restrictive Filters > ---------------------------------------------------------- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug > Reporter: Kaival Parikh > Priority: Minor > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org