jpountz commented on code in PR #14400:
URL: https://github.com/apache/lucene/pull/14400#discussion_r2012975372


##########
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##########
@@ -171,37 +171,36 @@ private int scoreWindow(
       }
     }
 
-    if (acceptDocs == null) {
-      int minDocIDRunEnd = max;
-      for (DisiWrapper w : iterators) {
-        if (w.docID() > min) {
-          minDocIDRunEnd = min;
-          break;
-        } else {
-          minDocIDRunEnd = Math.min(minDocIDRunEnd, w.docIDRunEnd());
-        }
-      }
-
-      if (minDocIDRunEnd - min >= WINDOW_SIZE / 2) {
-        // We have a large range of doc IDs that all match.
-        rangeDocIdStream.from = min;
-        rangeDocIdStream.to = minDocIDRunEnd;
-        collector.collect(rangeDocIdStream);
-        return minDocIDRunEnd;
-      }
-    }
-
-    int bitsetWindowMax = (int) Math.min(max, (long) min + WINDOW_SIZE);
-
+    // Partition clauses of the conjunction into:
+    //  - clauses that don't fully match the first half of the window and get 
evaluated via
+    // #loadIntoBitSet or leaf-frog,
+    //  - other clauses that are used to compute the greatest possible window 
size that they fully
+    // match.
+    // This logic helps align scoring windows with the natural #docIDRunEnd() 
boundaries of the
+    // data, which helps evaluate fewer clauses per window - without allowing 
windows to become too
+    // small thanks to the WINDOW_SIZE/2 threshold.
+    int minDocIDRunEnd = max;
     for (DisiWrapper w : iterators) {
-      if (w.docID() > min || w.docIDRunEnd() < bitsetWindowMax) {
+      int docIdRunEnd = w.docIDRunEnd();
+      if (w.docID() > min || (docIdRunEnd - min) < WINDOW_SIZE / 2) {
         windowApproximations.add(w.approximation());
         if (w.twoPhase() != null) {
           windowTwoPhases.add(w.twoPhase());
         }
+      } else {
+        minDocIDRunEnd = Math.min(minDocIDRunEnd, docIdRunEnd);
       }
     }
 
+    if (acceptDocs == null && windowApproximations.isEmpty()) {

Review Comment:
   If accept docs are not null, we shouldn't call `intoBitSet` on any clause. 
However we'll stick to a window of size 4,096 and convert the accept docs 
`Bits` into a bit set using `Bits#applyMask`.
   
   We may be able to do better if deletions are extremely sparse, but I 
couldn't think of an obvious way of handling it and I'm not sure how much this 
case is worth optimizing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to