jpountz opened a new pull request, #15149:
URL: https://github.com/apache/lucene/pull/15149
Lucene recently got very good performance improvements by introducing APIs
that apply to batches of doc IDs at once: `DocIdSetIterator#intoBitSet`,
`PostingsEnum#nextPostings`, `Scorer#nextDocsAndScores` and `SimScorer#score`.
This helps better amortize the cost of virtual function calls across many doc
IDs, and also apply additional optimizations, e.g. it's more efficient to
bulk-iterate set bits in a `FixedBitSet` than to iterate them one-by-one via
`FixedBitSet#nextSetBit`.
This PR introduces bulk retrieval for numeric doc values. It is currently
only implemented on norms and used to retrieve norms for doc IDs to score, but
I tried to design the API in a way that also works for numeric doc values and
is sustainable. Specifically, I'm thinking that optimizing the single-valued
and dense case should go a very long way, so I did not try to help users
retrieve information about which docs have a value or not. In some cases, this
is not even needed. E.g. if you want to compute the sum of the values of a
field, returning 0 for docs that don't have a value is good. In the event when
knowing which docs have a value is important (such as Lucene's
`HistogramCollector`), it is still possible to optimize the case when there are
long runs of docs with a value with something like below:
```java
void doSomethingWith(int size, int[] docs, NumericDocValues values) {
if (size > 0 && values.advanceExact(docs[0]) && values.docIDRunEnd() >
docs[size - 1]) {
long[] longValues = new long[size];
values.longValues(size, docs, longValues);
// do something with the `longValues` array
} else {
// use #advanceExact / #longValue directly
for (int i = 0; i < size; i++) {
if (values.advanceExact(docs[i])) {
// do something with values#longValue
} else {
// handle the case when docs don't have a value
}
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]