ariel-miculas commented on PR #22729:
URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4611260579

   I'm curious about the high-level vision: is the plan to close 
https://github.com/apache/datafusion/pull/15591 in favor of this new approach?
   
   I would like the redesign of hash aggregation to take into account the 
memory constraints imposed by the finite memory pool, i.e. how does the 
implementation perform under OOM conditions. 
   * how do we improve memory accounting (see 
https://github.com/apache/datafusion/issues/22526).
   * how do we avoid excessive memory allocations during OOM condition (see 
https://github.com/apache/datafusion/pull/22165)
   * other issues such as https://github.com/apache/datafusion/issues/19906
   
   Otherwise we'll end up with the same issues that exist now. E.g. 
EmitTo::First(n) wasn't designed for emitting a large portion of the existing 
groups, so it over-allocated when used for emitting early in partial 
aggregation OOM case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to