alamb opened a new pull request, #21351: URL: https://github.com/apache/datafusion/pull/21351
Stacked on - https://github.com/apache/datafusion/pull/21342 Which is then stacked on - https://github.com/apache/datafusion/pull/21327 - https://github.com/apache/datafusion/pull/21340 ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/20529 - Broken out of https://github.com/apache/datafusion/pull/20820 ## Rationale for this change The whole point of this sequence of PRs is to enable dynamic work scheduling in the FileStream (so that if a task is done it can look at any remaining work) ## What changes are included in this PR? 1. Add shared state to FileStream for siblings 2. Sibling streams put their file work into a shared queue when it can be reordered Note there are a bunch of other things that are NOT included in this PR, including 1. Trying to limit concurrent IO (this PR has the same properties as main -- up to one outstanding IO per partition) 2. Trying to issue multiple IOs by the same partition (aka to interleave IO and CPU work) ## Are these changes tested? Yes by existing functional and benchmark tests, as well as new functional tests ## Are there any user-facing changes? Yes, faster performance (TODO MEASURE) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
