[
https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553467#comment-17553467
]
Kaival Parikh commented on LUCENE-10611:
----------------------------------------
As a fix, we can check if results are incomplete after [this
line|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L89],
and return results accordingly
{code:java}
diff --git
a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
index b1a2436166f..7c641f077ee 100644
--- a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
+++ b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
@@ -87,6 +87,9 @@ public final class HnswGraphSearcher {
int numVisited = 0;
for (int level = graph.numLevels() - 1; level >= 1; level--) {
results = graphSearcher.searchLevel(query, 1, level, eps, vectors,
graph, null, visitedLimit);
+ if (results.incomplete()) {
+ return results;
+ }
eps[0] = results.pop();
numVisited += results.visitedCount();{code}
Alternatively, we do not enforce limits in higher levels by setting the limit
as Integer.MAX_VALUE
[here|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L89]
(also not updating the counts
[here|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L92-L93]),
but we might end up visiting more nodes than desired
{code:java}
diff --git
a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
index b1a2436166f..0101cbd7690 100644
--- a/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
+++ b/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
@@ -86,11 +86,8 @@ public final class HnswGraphSearcher {
int[] eps = new int[] {graph.entryNode()};
int numVisited = 0;
for (int level = graph.numLevels() - 1; level >= 1; level--) {
- results = graphSearcher.searchLevel(query, 1, level, eps, vectors,
graph, null, visitedLimit);
+ results = graphSearcher.searchLevel(query, 1, level, eps, vectors,
graph, null, Integer.MAX_VALUE);
eps[0] = results.pop();
-
- numVisited += results.visitedCount();
- visitedLimit -= results.visitedCount();
}
results =
graphSearcher.searchLevel(query, topK, 0, eps, vectors, graph,
acceptOrds, visitedLimit); {code}
> KnnVectorQuery throwing Heap Error for Restrictive Filters
> ----------------------------------------------------------
>
> Key: LUCENE-10611
> URL: https://issues.apache.org/jira/browse/LUCENE-10611
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Kaival Parikh
> Priority: Minor
>
> The HNSW graph search does not consider that visitedLimit may be reached in
> the upper levels of graph search itself
> This occurs when the pre-filter is too restrictive (and its count sets the
> visitedLimit). So instead of switching over to exactSearch, it tries to [pop
> from an empty
> heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90]
> and throws an error
>
> To reproduce this error, we can +increase the numDocs
> [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500]
> to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached
> faster)
>
> Stacktrace:
> The heap is empty
> java.lang.IllegalStateException: The heap is empty
> at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0)
> at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111)
> at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98)
> at
> org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90)
> at
> org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236)
> at
> org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272)
> at
> org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235)
> at
> org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]