Kontinuation commented on PR #21425:
URL: https://github.com/apache/datafusion/pull/21425#issuecomment-4306344566

   > One concern ive found experimenting with doing this with streams: During 
execution lets say we have `SortExec.poll_next()` → 
`RepartitionExec.poll_next()` → `try_grow()` FAILS → `pool.reclaim()` → sends 
signal to `SortExec` → waits for `SortExec` → DEADLOCK (`SortExec` is stuck 
waiting for `RepartitionExec` to return)
   > 
   > Basically we could one operator A which is streaming data to another B 
which runs out memory and wants to reclaim from A.
   
   Processing next batch and reclaim memory in the same loop may work for 
pipeline-execution engine, but not work well for volcano style execution engine 
such as DataFusion.
   
   In a pipeline execution engine, a central scheduler manages control flow and 
invokes operators directly. Because RepartitionExec and SortExec are decoupled 
in the scheduler's task logic, the engine can emit a reclamation request to 
SortExec even while RepartitionExec is active, without risking a call-stack 
deadlock.
   
   In DataFusion’s async Volcano model, the control flow is nested within the 
operators themselves via async streams. This means SortExec and RepartitionExec 
are coupled in the async poll stack. If a child operator (RepartitionExec) 
blocks while waiting for a memory reclamation response from its parent 
(SortExec), it creates a circular dependency: the parent cannot spill because 
it is suspended awaiting the child's output.
   
   To avoid this, we should embrace the concurrent nature of memory allocation 
and reclamation. Spillable operators must implement spilling as an 
asynchronous, independent operation that does not rely on the current execution 
stack. We should establish clear implementation guidelines for these operators 
to ensure they can safely handle reclamation requests while in a suspended 
state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to