[PR] Add bulk-retrieval API to NumericDocValues. [lucene]

via GitHub Wed, 03 Sep 2025 01:29:51 -0700


jpountz opened a new pull request, #15149:
URL: https://github.com/apache/lucene/pull/15149


   Lucene recently got very good performance improvements by introducing APIs 
that apply to batches of doc IDs at once: `DocIdSetIterator#intoBitSet`, 
`PostingsEnum#nextPostings`, `Scorer#nextDocsAndScores` and `SimScorer#score`. 
This helps better amortize the cost of virtual function calls across many doc 
IDs, and also apply additional optimizations, e.g. it's more efficient to 
bulk-iterate set bits in a `FixedBitSet` than to iterate them one-by-one via 
`FixedBitSet#nextSetBit`.
   
   This PR introduces bulk retrieval for numeric doc values. It is currently 
only implemented on norms and used to retrieve norms for doc IDs to score, but 
I tried to design the API in a way that also works for numeric doc values and 
is sustainable. Specifically, I'm thinking that optimizing the single-valued 
and dense case should go a very long way, so I did not try to help users 
retrieve information about which docs have a value or not. In some cases, this 
is not even needed. E.g. if you want to compute the sum of the values of a 
field, returning 0 for docs that don't have a value is good. In the event when 
knowing which docs have a value is important (such as Lucene's 
`HistogramCollector`), it is still possible to optimize the case when there are 
long runs of docs with a value with something like below:
   
   ```java
   void doSomethingWith(int size, int[] docs, NumericDocValues values) {
     if (size > 0 && values.advanceExact(docs[0]) && values.docIDRunEnd() &gt; 
docs[size - 1]) {
       long[] longValues = new long[size];
       values.longValues(size, docs, longValues);
       // do something with the `longValues` array
     } else {
       // use #advanceExact / #longValue directly
       for (int i = 0; i < size; i++) {
         if (values.advanceExact(docs[i])) {
           // do something with values#longValue
         } else {
           // handle the case when docs don't have a value
         }
       }
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add bulk-retrieval API to NumericDocValues. [lucene]

Reply via email to