xiedeyantu commented on PR #21058: URL: https://github.com/apache/datafusion/pull/21058#issuecomment-4178647113
> Sorry for the delayed response, @xiedeyantu ! > > Thanks for revising this. I'm a bit concerned by the overhead here; we are added a `UInt32` column to _every_ query with grouping sets just to handle a relatively rare situation. > > I actually liked the approach you took in the initial PR better; encoding duplicates into the high bits of the grouping ID is a nice approach. We'd just need to take care to mask out the high bits in the `GROUPING` function, and if possible it would be nice to avoid arbitrary limits like not supporting duplicate grouping sets for queries with 8/16/etc. grouping sets. > > Alternatively we could represent grouping IDs as an index into the list of GROUPING SETS. That would provide an ID without concern for duplicates, and then we'd implement the `GROUPING` function with a `CASE` or similar construct. @neilconway Thanks for your kind guidance. I have refactored this PR according to your comments. Due to the significant changes, I force-pushed the commit. I think this version is better than the previous one. If you have time, please take a look again. Sorry for making multiple changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
