alamb opened a new pull request, #21852: URL: https://github.com/apache/datafusion/pull/21852
TODO - [ ] Tests - [ ] Make a PR to refactor the large `poll_scan` loop into separate functions to reduce the indent level / control flow ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/21598 ## Rationale for this change In https://github.com/apache/datafusion/pull/21342#discussion_r3070184104, @adriangb pointed out that the current [Morsel](https://github.com/apache/datafusion/blob/04dbbbf6694a4b162f76aee0091fdc3a47d2f9f0/datafusion/datasource/src/morsel/mod.rs#L52) API relied on a comment rather than they typesystem to separate IO and CPU. Also, it should be pointed out that the current Parquet opener actually does now do IO in the stream reader. This makes overlapping the IO and CPU harder ## What changes are included in this PR? It would be nice to 2. Make the morsel API harder to misuse 1. Avoid IO after morsels are ready I don't expect this change will have much of an actual impact (yet) but I do expect that it will set us up for better IO interleaving ## Are these changes tested? yes by CI and new tests (to be written) ## Are there any user-facing changes? The unreleased Morsel API is slightly different -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
