jpountz commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2139074516
> If I understand correctly, the read ahead mechanism in IndexInput will be useful if matching docs fall within the read ahead size. Otherwise those will be wasted pages cached or downloaded in the warm index use-case and prefetch will not be useful. This is correct. For the record, this wastage may sound disappointing, but it also helps with making I/O more concurrent. For instance, say you have a conjunction on two clauses: "a AND b" (which could be postings, but also doc-value-based iterators, e.g. via a `FieldExistsQuery`). First we advance `a`, then we advance `b` to the next doc that is on of after the doc that `a` is on. If we don't want to prefetch data without evidence that it's actually going to be needed then we have no way of doing the I/O for `a` and `b` in parallel since we need to finish the I/O for `a` before having a chance to know what to prefetch for `b`. > Sure, I can take a stab for say NumericDocValues and in context of facets to start with. This sounds fine, we need to start somewhere. FWIW the main consumers of the `NumericDocValues` API that we should care about in my opinion are `NumericComparator`, `SortedNumericDocValuesRangeQuery`, `LeafSimScorer` and the `lucene/facet` module. Ideally we'd come up with an approach that works well for all these consumers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org