jpountz commented on code in PR #14273:
URL: https://github.com/apache/lucene/pull/14273#discussion_r2014552652


##########
lucene/core/src/java/org/apache/lucene/search/DocIdStream.java:
##########
@@ -34,12 +33,35 @@ protected DocIdStream() {}
    * Iterate over doc IDs contained in this stream in order, calling the given 
{@link
    * CheckedIntConsumer} on them. This is a terminal operation.
    */
-  public abstract void forEach(CheckedIntConsumer<IOException> consumer) 
throws IOException;
+  public void forEach(CheckedIntConsumer<IOException> consumer) throws 
IOException {
+    forEach(DocIdSetIterator.NO_MORE_DOCS, consumer);
+  }
+
+  /**
+   * Iterate over doc IDs contained in this doc ID stream up to the given 
{@code upTo} exclusive,
+   * calling the given {@link CheckedIntConsumer} on them. It is not possible 
to iterate these doc
+   * IDs again later on.
+   */
+  public abstract void forEach(int upTo, CheckedIntConsumer<IOException> 
consumer)
+      throws IOException;
 
   /** Count the number of entries in this stream. This is a terminal 
operation. */
   public int count() throws IOException {
     int[] count = new int[1];
     forEach(doc -> count[0]++);
     return count[0];
   }
+
+  /**
+   * Count the number of doc IDs in this stream that are below the given 
{@code upTo}. These doc IDs
+   * may not be consumed again later.
+   */
+  public int count(int upTo) throws IOException {

Review Comment:
   > Are you thinking of peeking into these bit sets to provide cardinality up 
to the specific doc? (Or maybe I'm missing something?)
   
   Yes exactly. I have something locally already, I need to beef up testing a 
bit.
   
   The bitset-based `DocIdStream` is one interesting implementation, the other 
interesting implementation is the one that is backed by a range of doc IDs that 
all match. It is internally used by queries that fully match a segment (e.g. 
`PointRangeQuery` when all the segment's values are contained in the query 
range, or `MatchAllDocsQuery`) or queries on fields that are part of (or 
correlate with) the index sort fields. See #14312 for reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to