Dandandan opened a new issue, #21329:
URL: https://github.com/apache/datafusion/issues/21329
### Is your feature request related to a problem or challenge?
SortPreservingMergeExec::execute() eagerly calls execute() on all input
partitions and spawns buffered tasks immediately, before the output stream is
ever polled. This means resources are allocated and work
begins even if the stream is never consumed (e.g. query cancelled before
first poll), and creates an unnecessary burst of concurrent tasks when many
SortPreservingMergeExec nodes exist in a plan.
### Describe the solution you'd like
Defer the spawning of input partition tasks and construction of the
streaming merge to the first poll_next() call on the output stream, rather than
doing it eagerly in execute(). This can be done with a
wrapper stream that holds the initialization state and transitions from
Pending to Running on first poll. The single-partition and zero-partition fast
paths can remain unchanged.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]