Re: [PR] Re-use information from graph traversal during exact search [lucene]

via GitHub Mon, 20 Nov 2023 14:32:18 -0800


kaivalnp commented on code in PR #12820:
URL: https://github.com/apache/lucene/pull/12820#discussion_r1399829104



##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
     Bits liveDocs = ctx.reader().getLiveDocs();
     int maxDoc = ctx.reader().maxDoc();
 
+    Bits acceptDocs;
+    int cost;
     if (filterWeight == null) {
-      return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE);
+      acceptDocs = liveDocs;
+      cost = Integer.MAX_VALUE;
+    } else {
+      Scorer scorer = filterWeight.scorer(ctx);
+      if (scorer == null) {
+        return NO_RESULTS;
+      }
+      acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
+      cost = ((BitSet) acceptDocs).cardinality();
     }
 
-    Scorer scorer = filterWeight.scorer(ctx);
-    if (scorer == null) {
+    KnnCollector collector = getCollector(ctx, cost);
+    if (collector == null) {

Review Comment:
   The `null` check comes from 
`DiversifyingChildren{Byte,Float}KnnVectorQuery`, where the [`parentBitSet` may 
be 
`null`](https://github.com/kaivalnp/lucene/blob/reuse-graph-search/lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java#L68-L72)
   
   In this case, we do not want to perform any searches, since there are no 
matching docs anyways



##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
     Bits liveDocs = ctx.reader().getLiveDocs();
     int maxDoc = ctx.reader().maxDoc();
 
+    Bits acceptDocs;
+    int cost;
     if (filterWeight == null) {
-      return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE);
+      acceptDocs = liveDocs;
+      cost = Integer.MAX_VALUE;

Review Comment:
   I agree, it looks a bit convoluted now. I'll change it back to the original 
flow



##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -109,32 +109,36 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
     Bits liveDocs = ctx.reader().getLiveDocs();
     int maxDoc = ctx.reader().maxDoc();
 
+    Bits acceptDocs;
+    int cost;
     if (filterWeight == null) {
-      return approximateSearch(ctx, liveDocs, Integer.MAX_VALUE);
+      acceptDocs = liveDocs;
+      cost = Integer.MAX_VALUE;
+    } else {
+      Scorer scorer = filterWeight.scorer(ctx);
+      if (scorer == null) {
+        return NO_RESULTS;
+      }
+      acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
+      cost = ((BitSet) acceptDocs).cardinality();
     }
 
-    Scorer scorer = filterWeight.scorer(ctx);
-    if (scorer == null) {
+    KnnCollector collector = getCollector(ctx, cost);
+    if (collector == null) {
       return NO_RESULTS;
     }
 
-    BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-    int cost = acceptDocs.cardinality();
-
-    if (cost <= k) {
-      // If there are <= k possible matches, short-circuit and perform exact 
search, since HNSW
-      // must always visit at least k documents
-      return exactSearch(ctx, new BitSetIterator(acceptDocs, cost));
+    if (cost > k) {
+      // Perform the approximate kNN search
+      approximateSearch(ctx, acceptDocs, collector);
     }
 
-    // Perform the approximate kNN search
-    TopDocs results = approximateSearch(ctx, acceptDocs, cost);
-    if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) {
-      return results;
-    } else {
-      // We stopped the kNN search because it visited too many nodes, so fall 
back to exact search
-      return exactSearch(ctx, new BitSetIterator(acceptDocs, cost));
+    if ((cost <= k || collector.earlyTerminated()) && acceptDocs instanceof 
BitSet bitSet) {
+      // If there are <= k possible matches, or we stopped the kNN search 
because it visited too
+      // many nodes, fall back to exact search
+      return exactSearch(ctx, new BitSetIterator(bitSet, cost), collector);

Review Comment:
   Yes, this is based on the assumption that any doc "visited but not 
collected" should be outright rejected from `#exactSearch`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Re-use information from graph traversal during exact search [lucene]

Reply via email to