nathanb9 commented on PR #21995:
URL: https://github.com/apache/datafusion/pull/21995#issuecomment-4385191161

   sounds good, thanks. Yep, physical rule preferred because there is some 
complexity on additional groupjoins optimizations possible which im not sure is 
possible just in logical layer
   
   When groupjoin opportunity is found we always do the Memoizing GroupJoin so 
simply build the hash table on the left side with accumulators embedded, then 
update them inplace during probe.
   Then with its own physical plan we do additional optimization like for 
example:
   
   One big one from the papers is "Eager Right Aggregation" which is just 
pre-aggregate the probe side before the join, reducing its cardinality from |S| 
to |distinct(S.join_key)|. Ideal when most right-side groups have a 
corresponding value (one way to verify this is foreign key constraint which can 
be added with eager aggregation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to