kaivalnp commented on code in PR #12820: URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829104
########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw Bits liveDocs = ctx.reader().getLiveDocs(); int maxDoc = ctx.reader().maxDoc(); + Bits acceptDocs; + int cost; if (filterWeight == null) { - return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE); + acceptDocs = liveDocs; + cost = Integer.MAX_VALUE; + } else { + Scorer scorer = filterWeight.scorer(ctx); + if (scorer == null) { + return NO_RESULTS; + } + acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); + cost = ((BitSet) acceptDocs).cardinality(); } - Scorer scorer = filterWeight.scorer(ctx); - if (scorer == null) { + KnnCollector collector = getCollector(ctx, cost); + if (collector == null) { Review Comment: The `null` check comes from `DiversifyingChildren{Byte,Float}KnnVectorQuery`, where the [`parentBitSet` may be `null`](https://github.com/kaivalnp/lucene/blob/reuse-graph-search/lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java#L68-L72) In this case, we do not want to perform any searches, since there are no matching docs anyways ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw Bits liveDocs = ctx.reader().getLiveDocs(); int maxDoc = ctx.reader().maxDoc(); + Bits acceptDocs; + int cost; if (filterWeight == null) { - return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE); + acceptDocs = liveDocs; + cost = Integer.MAX_VALUE; Review Comment: I agree, it looks a bit convoluted now. I'll change it back to the original flow ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw Bits liveDocs = ctx.reader().getLiveDocs(); int maxDoc = ctx.reader().maxDoc(); + Bits acceptDocs; + int cost; if (filterWeight == null) { - return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE); + acceptDocs = liveDocs; + cost = Integer.MAX_VALUE; + } else { + Scorer scorer = filterWeight.scorer(ctx); + if (scorer == null) { + return NO_RESULTS; + } + acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); + cost = ((BitSet) acceptDocs).cardinality(); } - Scorer scorer = filterWeight.scorer(ctx); - if (scorer == null) { + KnnCollector collector = getCollector(ctx, cost); + if (collector == null) { return NO_RESULTS; } - BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); - int cost = acceptDocs.cardinality(); - - if (cost <= k) { - // If there are <= k possible matches, short-circuit and perform exact search, since HNSW - // must always visit at least k documents - return exactSearch(ctx, new BitSetIterator(acceptDocs, cost)); + if (cost > k) { + // Perform the approximate kNN search + approximateSearch(ctx, acceptDocs, collector); } - // Perform the approximate kNN search - TopDocs results = approximateSearch(ctx, acceptDocs, cost); - if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) { - return results; - } else { - // We stopped the kNN search because it visited too many nodes, so fall back to exact search - return exactSearch(ctx, new BitSetIterator(acceptDocs, cost)); + if ((cost <= k || collector.earlyTerminated()) && acceptDocs instanceof BitSet bitSet) { + // If there are <= k possible matches, or we stopped the kNN search because it visited too + // many nodes, fall back to exact search + return exactSearch(ctx, new BitSetIterator(bitSet, cost), collector); Review Comment: Yes, this is based on the assumption that any doc "visited but not collected" should be outright rejected from `#exactSearch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org