Dandandan opened a new issue, #21329:
URL: https://github.com/apache/datafusion/issues/21329

   ### Is your feature request related to a problem or challenge?
   
     SortPreservingMergeExec::execute() eagerly calls execute() on all input 
partitions and spawns buffered tasks immediately, before the output stream is 
ever polled. This means resources are allocated and work
      begins even if the stream is never consumed (e.g. query cancelled before 
first poll), and creates an unnecessary burst of concurrent tasks when many 
SortPreservingMergeExec nodes exist in a plan.
   
   
   ### Describe the solution you'd like
   
     Defer the spawning of input partition tasks and construction of the 
streaming merge to the first poll_next() call on the output stream, rather than 
doing it eagerly in execute(). This can be done with a    
     wrapper stream that holds the initialization state and transitions from 
Pending to Running on first poll. The single-partition and zero-partition fast 
paths can remain unchanged.
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to