xiedeyantu opened a new issue, #21316: URL: https://github.com/apache/datafusion/issues/21316
### Describe the bug When `GROUPING SETS` contains duplicate grouping lists, DataFusion incorrectly collapses them during execution. The internal `grouping_id` only encodes the semantic null mask, so repeated grouping sets share the same execution key and are merged together, causing rows to be lost. This behavior differs from PostgreSQL, which preserves duplicate grouping sets and returns duplicated result rows. ### To Reproduce Create the sample table: ``` create table duplicate_grouping_sets(deptno int, job varchar, sal int, comm int); insert into duplicate_grouping_sets values (10, 'CLERK', 1300, null), (20, 'MANAGER', 3000, null); ``` Run the query: ``` select deptno, job, sal, sum(comm), grouping(deptno), grouping(job), grouping(sal) from duplicate_grouping_sets group by grouping sets ((deptno, job), (deptno, sal), (deptno, job)) order by deptno, job, sal, grouping(deptno), grouping(job), grouping(sal); ``` Compare the result with PostgreSQL. PostgreSQL returns 6 rows, including the duplicate (deptno, job) grouping set, while DataFusion currently returns only 4 rows. ### Expected behavior Duplicate GROUPING SETS entries should be preserved during execution, and the query result should match PostgreSQL behavior, returning duplicated rows when the grouping set list contains duplicates. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
