alamb opened a new pull request, #21852:
URL: https://github.com/apache/datafusion/pull/21852

   TODO
   - [ ] Tests
   - [ ] Make a PR to refactor the large `poll_scan` loop into separate 
functions to reduce the indent level / control flow
   
   ## Which issue does this PR close?
   
   - Closes https://github.com/apache/datafusion/issues/21598
   
   ## Rationale for this change
   
   In https://github.com/apache/datafusion/pull/21342#discussion_r3070184104, 
@adriangb pointed out that  the current  
[Morsel](https://github.com/apache/datafusion/blob/04dbbbf6694a4b162f76aee0091fdc3a47d2f9f0/datafusion/datasource/src/morsel/mod.rs#L52)
 API relied on a comment rather than they typesystem to separate IO and CPU. 
   
   Also, it should be pointed out that the current Parquet opener actually does 
now do IO in the stream reader. This makes overlapping the IO and CPU harder
   
   ## What changes are included in this PR?
   
   It would be nice to
   2. Make the morsel API harder to misuse 
   1. Avoid IO after morsels are ready
   
   I don't expect this change will have much of an actual impact (yet) but I do 
expect that it will set us up for better IO interleaving
   
   ## Are these changes tested?
   yes by CI and new tests (to be written)
   
   ## Are there any user-facing changes?
   The unreleased Morsel API is slightly different


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to