Dandandan opened a new pull request, #21330: URL: https://github.com/apache/datafusion/pull/21330
## Which issue does this PR close? <!-- Related to the same approach as the SortPreservingMergeExec lazy spawning work. --> ## Rationale for this change Currently, `RepartitionExec::execute()` eagerly calls `ensure_input_streams_initialized()` which opens all input streams immediately, even before any output partition is polled. This means that when only one output partition is needed (or when the consumer isn't ready yet), all input partitions are already executing and buffering data. For example, in a `HashJoinExec` plan, both the build and probe side `RepartitionExec` nodes start pulling data eagerly, even though the probe side isn't consumed until the build side completes — wasting memory and I/O. ## What changes are included in this PR? Removes the eager `ensure_input_streams_initialized` call from `execute()`. The `consume_input_streams` method (called on first poll via `futures::stream::once`) already handles the `NotInitialized` state, so this was purely redundant eager work. Updated `error_for_input_exec` test to expect the error on stream poll rather than on `execute()`, since initialization is now deferred. ## Are these changes tested? Covered by existing repartition tests (33 pass). One test updated to match new lazy behavior. ## Are there any user-facing changes? No API changes. Errors from input `execute()` calls are now surfaced on stream poll rather than on `execute()`, which is consistent with how other deferred errors work in DataFusion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
