xiedeyantu opened a new issue, #21316:
URL: https://github.com/apache/datafusion/issues/21316

   ### Describe the bug
   
   When `GROUPING SETS` contains duplicate grouping lists, DataFusion 
incorrectly collapses them during execution. The internal `grouping_id` only 
encodes the semantic null mask, so repeated grouping sets share the same 
execution key and are merged together, causing rows to be lost.
   
   This behavior differs from PostgreSQL, which preserves duplicate grouping 
sets and returns duplicated result rows.
   
   ### To Reproduce
   
   Create the sample table:
   ```
   create table duplicate_grouping_sets(deptno int, job varchar, sal int, comm 
int);
   insert into duplicate_grouping_sets values
   (10, 'CLERK', 1300, null),
   (20, 'MANAGER', 3000, null);
   ```
   Run the query:
   ```
   select deptno, job, sal, sum(comm), grouping(deptno), grouping(job), 
grouping(sal)
   from duplicate_grouping_sets
   group by grouping sets ((deptno, job), (deptno, sal), (deptno, job))
   order by deptno, job, sal, grouping(deptno), grouping(job), grouping(sal);
   ```
   Compare the result with PostgreSQL. PostgreSQL returns 6 rows, including the 
duplicate (deptno, job) grouping set, while DataFusion currently returns only 4 
rows.
   
   ### Expected behavior
   
   Duplicate GROUPING SETS entries should be preserved during execution, and 
the query result should match PostgreSQL behavior, returning duplicated rows 
when the grouping set list contains duplicates.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to