alamb opened a new pull request, #21351:
URL: https://github.com/apache/datafusion/pull/21351

   Stacked on
   - https://github.com/apache/datafusion/pull/21342
   Which is then stacked on
   - https://github.com/apache/datafusion/pull/21327
   - https://github.com/apache/datafusion/pull/21340
   
   ## Which issue does this PR close?
   
   - part of https://github.com/apache/datafusion/issues/20529
   - Broken out of https://github.com/apache/datafusion/pull/20820
   
   ## Rationale for this change
   
   The whole point of this sequence of PRs is to enable dynamic work scheduling 
in the FileStream (so that if a task is done it can look at any remaining work)
   
   ## What changes are included in this PR?
   
   1. Add shared state to FileStream for siblings
   2. Sibling streams put their file work into a shared queue when it can be 
reordered
   
   Note there are a bunch of other things that are NOT included in this PR, 
including
   1. Trying to limit concurrent IO (this PR has the same properties as main -- 
up to one outstanding IO per partition)
   2. Trying to issue multiple IOs by the same partition (aka to interleave IO 
and CPU work)
   
   
   ## Are these changes tested?
   
   Yes by existing functional and benchmark tests, as well as new functional 
tests
   
   ## Are there any user-facing changes?
   Yes, faster performance (TODO MEASURE)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to