Re: [PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

via GitHub Thu, 02 Apr 2026 08:21:31 -0700


xiedeyantu commented on PR #21058:
URL: https://github.com/apache/datafusion/pull/21058#issuecomment-4178647113


   > Sorry for the delayed response, @xiedeyantu !
   > 
   > Thanks for revising this. I'm a bit concerned by the overhead here; we are 
added a `UInt32` column to _every_ query with grouping sets just to handle a 
relatively rare situation.
   > 
   > I actually liked the approach you took in the initial PR better; encoding 
duplicates into the high bits of the grouping ID is a nice approach. We'd just 
need to take care to mask out the high bits in the `GROUPING` function, and if 
possible it would be nice to avoid arbitrary limits like not supporting 
duplicate grouping sets for queries with 8/16/etc. grouping sets.
   > 
   > Alternatively we could represent grouping IDs as an index into the list of 
GROUPING SETS. That would provide an ID without concern for duplicates, and 
then we'd implement the `GROUPING` function with a `CASE` or similar construct.
   
   @neilconway Thanks for your kind guidance. I have refactored this PR 
according to your comments. Due to the significant changes, I force-pushed the 
commit. I think this version is better than the previous one. If you have time, 
please take a look again. Sorry for making multiple changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

Reply via email to