Kontinuation commented on code in PR #21425:
URL: https://github.com/apache/datafusion/pull/21425#discussion_r3105583899


##########
datafusion/execution/src/memory_pool/pool.rs:
##########
@@ -523,6 +503,57 @@ impl<I: MemoryPool> MemoryPool for TrackConsumersPool<I> {
         Ok(())
     }
 
+    fn reclaim(
+        &self,
+        target_bytes: usize,
+        exclude_consumer_id: Option<usize>,
+    ) -> Result<usize> {

Review Comment:
   `TrackConsumersPool` needs to be redesigned as we introduce cooperative 
memory management: tracking consumers would be a hard requirement for 
cooperative memory management to work. Defining it as a decorator would be a 
bit strange:
   
   1. `GreedyMemoryPool` won't reclaim other consumer's memory unless being 
used with `TrackConsumersPool`.
   2. There is no point reclaiming other consumer's memory when using 
`FairSpillPool`.
   
   I think we can define a common `ConsumerTracker` and integrate it into both 
`GreedyMemoryPool` and `FairSpillPool`. `GreedyMemoryPool::reclaim` implements 
cooperative memory reclamation while `FairSpillPool` just need an empty 
implementation. Having consumer tracking always on won't bring too much 
performance penalty while enabling rich memory usage reporting on memory 
reservation failures, and I haven't seen a reason for not enabling it in 
practice.



##########
datafusion/execution/src/memory_pool/mod.rs:
##########
@@ -205,6 +205,19 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug {
     /// On error the `allocation` will not be increased in size
     fn try_grow(&self, reservation: &MemoryReservation, additional: usize) -> 
Result<()>;
 
+    /// Attempt to reclaim `target_bytes` from existing spillable consumers 
already registered
+    /// with this pool.
+    ///
+    /// `exclude_consumer_id`, when provided, identifies the current requester 
and should not be
+    /// reclaimed from to avoid re-entering the same operator while it is 
mid-allocation.
+    fn reclaim(

Review Comment:
   Reclamation of memory usually involves spilling internal states to disk, 
which is usually async. Defining `reclaim` as a synchronous interface might be 
unnatural.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to