Re: [PR] Optimize `PointRangeQuery` for intra-segment concurrency with segment-level `DocIdSet` caching [lucene]

via GitHub Mon, 24 Nov 2025 02:10:37 -0800


iverase commented on code in PR #15446:
URL: https://github.com/apache/lucene/pull/15446#discussion_r2555573979



##########
lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java:
##########
@@ -291,49 +358,156 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
         } else {
           allDocsMatch = false;
         }
-
         if (allDocsMatch) {
           // all docs have a value and all points are within bounds, so 
everything matches
           return ConstantScoreScorerSupplier.matchAll(score(), scoreMode, 
reader.maxDoc());
         } else {
-          return new ConstantScoreScorerSupplier(score(), scoreMode, 
reader.maxDoc()) {
-
-            final DocIdSetBuilder result = new 
DocIdSetBuilder(reader.maxDoc(), values);
-            final IntersectVisitor visitor = getIntersectVisitor(result);
-            long cost = -1;
-
-            @Override
-            public DocIdSetIterator iterator(long leadCost) throws IOException 
{
-              if (values.getDocCount() == reader.maxDoc()
-                  && values.getDocCount() == values.size()
-                  && cost() > reader.maxDoc() / 2) {
-                // If all docs have exactly one value and the cost is greater
-                // than half the leaf size then maybe we can make things faster
-                // by computing the set of documents that do NOT match the 
range
-                final FixedBitSet result = new FixedBitSet(reader.maxDoc());
-                long[] cost = new long[1];
-                values.intersect(getInverseIntersectVisitor(result, cost));
-                // Flip the bit set and cost
-                result.flip(0, reader.maxDoc());
-                cost[0] = Math.max(0, reader.maxDoc() - cost[0]);
-                return new BitSetIterator(result, cost[0]);
-              }
+          // Get or create the cached supplier for this segment
+          SegmentDocIdSetSupplier segmentSupplier =
+              segmentCache.computeIfAbsent(partition.ctx, ctx -> new 
SegmentDocIdSetSupplier(ctx));
+          // Each call creates a new PartitionScorerSupplier and all 
partitions share the same
+          // SegmentDocIdSetSupplier
+          return new PartitionScorerSupplier(
+              segmentSupplier, partition.minDocId, partition.maxDocId, 
score(), scoreMode);
+        }
+      }
 
-              values.intersect(visitor);
-              return result.build().iterator();
-            }
+      /** ScorerSupplier for a partition that filters results from the shared 
segment DocIdSet. */
+      final class PartitionScorerSupplier extends ScorerSupplier {
+        private final SegmentDocIdSetSupplier segmentSupplier;
+        private final int minDocId;
+        private final int maxDocId;
+        private final float score;
+        private final ScoreMode scoreMode;
+
+        PartitionScorerSupplier(
+            SegmentDocIdSetSupplier segmentSupplier,
+            int minDocId,
+            int maxDocId,
+            float score,
+            ScoreMode scoreMode) {
+          this.segmentSupplier = segmentSupplier;
+          this.minDocId = minDocId;
+          this.maxDocId = maxDocId;
+          this.score = score;
+          this.scoreMode = scoreMode;
+        }
 
-            @Override
-            public long cost() {
-              if (cost == -1) {
-                // Computing the cost may be expensive, so only do it if 
necessary
-                cost = values.estimateDocCount(visitor);
-                assert cost >= 0;
-              }
-              return cost;
-            }
-          };
+        @Override
+        public Scorer get(long leadCost) throws IOException {
+          DocIdSetIterator iterator = getIterator();
+          if (iterator == null) {
+            return null;
+          }
+          return new ConstantScoreScorer(score, scoreMode, iterator);
         }
+
+        private DocIdSetIterator getIterator() throws IOException {
+          // Get the shared DocIdSet (built once per segment)
+          DocIdSet docIdSet = segmentSupplier.getOrBuild();
+          DocIdSetIterator fullIterator = docIdSet.iterator();
+          if (fullIterator == null) {
+            return null;
+          }
+          // Check if this is a full segment (no partition filtering needed)
+          boolean isFullSegment = (minDocId == 0 && maxDocId == 
DocIdSetIterator.NO_MORE_DOCS);
+          if (isFullSegment) {
+            return fullIterator;
+          }
+          // Wrap iterator to filter to partition range
+          return new PartitionFilteredDocIdSetIterator(fullIterator, minDocId, 
maxDocId);
+        }
+
+        @Override
+        public long cost() {
+          DocIdSet docIdSet;
+          try {
+            docIdSet = segmentSupplier.getOrBuild();
+          } catch (IOException e) {
+            throw new RuntimeException(e);
+          }
+          long totalCost = docIdSet.iterator().cost();
+          boolean isFullSegment = (minDocId == 0 && maxDocId == 
DocIdSetIterator.NO_MORE_DOCS);
+          if (isFullSegment) {
+            return totalCost;
+          }
+          int segmentSize = segmentSupplier.context.reader().maxDoc();
+          int partitionSize = maxDocId - minDocId;
+          return (totalCost * partitionSize) / segmentSize;
+        }
+
+        @Override
+        public BulkScorer bulkScorer() throws IOException {
+          Scorer scorer = get(Long.MAX_VALUE);
+          if (scorer == null) {
+            return null;
+          }
+          return new Weight.DefaultBulkScorer(scorer);
+        }
+      }
+
+      /**
+       * Iterator that filters another iterator to only return docs within a 
partition range.
+       * Reading from a FixedBitSet is thread-safe (just reading from long[]), 
so multiple
+       * partitions can read from the same underlying DocIdSet concurrently.
+       */

Review Comment:
   This is wrong, we cannot share the iterator between partitions (even when 
the underlaying data structure is a FixedBitSet) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize `PointRangeQuery` for intra-segment concurrency with segment-level `DocIdSet` caching [lucene]

Reply via email to