Re: [I] Improve Lucene's I/O concurrency [lucene]

via GitHub Thu, 30 May 2024 01:52:26 -0700


jpountz commented on issue #13179:
URL: https://github.com/apache/lucene/issues/13179#issuecomment-2139074516


   > If I understand correctly, the read ahead mechanism in IndexInput will be 
useful if matching docs fall within the read ahead size. Otherwise those will 
be wasted pages cached or downloaded in the warm index use-case and prefetch 
will not be useful.
   
   This is correct. For the record, this wastage may sound disappointing, but 
it also helps with making I/O more concurrent. For instance, say you have a 
conjunction on two clauses: "a AND b" (which could be postings, but also 
doc-value-based iterators, e.g. via a `FieldExistsQuery`). First we advance 
`a`, then we advance `b` to the next doc that is on of after the doc that `a` 
is on. If we don't want to prefetch data without evidence that it's actually 
going to be needed then we have no way of doing the I/O for `a` and `b` in 
parallel since we need to finish the I/O for `a` before having a chance to know 
what to prefetch for `b`.
   
   > Sure, I can take a stab for say NumericDocValues and in context of facets 
to start with.
   
   This sounds fine, we need to start somewhere. FWIW the main consumers of the 
`NumericDocValues` API that we should care about in my opinion are 
`NumericComparator`, `SortedNumericDocValuesRangeQuery`, `LeafSimScorer` and 
the `lucene/facet` module. Ideally we'd come up with an approach that works 
well for all these consumers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve Lucene's I/O concurrency [lucene]

Reply via email to