the syntax i have posted is the same as sqlserver, oracle, snowflake, presto, and postgresql. The syntax of hive is different from them, but I think we can be compatible with these two ways.
流金岁月 <501204...@qq.com> 于2019年10月24日周四 下午5:00写道: > useful feature. > > > But I think the grammar of Hive may be a reference, it is a bit different > from your case: > SELECT a, b, SUM(c) FROM tab1 GROUP BY a, b GROUPING SETS ( (a,b) ) > > > ref: > https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup > > > > > ------------------ 原始邮件 ------------------ > 发件人: "寒江雪"<yangz...@gmail.com>; > 发送时间: 2019年10月24日(星期四) 中午11:30 > 收件人: "dev"<dev@doris.apache.org>; > > 主题: [Proposal] Enhanced Aggregation, Support Grouping Sets, Rollup andCube > to extend group by statement > > > > The CUBE, ROLLUP, and GROUPING SETS extensions to SQL make querying and > reporting easier and faster. CUBE, ROLLUP, and grouping sets produce a > single result set that is equivalent to a UNION ALL of differently grouped > rows. > > The three GROUPING functions make aggregate calculations more efficient, > thereby enhancing database performance, and scalability. > > GROUPING SETS clause > > GROUPING SETS syntax lets you define multiple groupings in the same > query. GROUP > BY computes all the groupings specified and combines them with UNION ALL. > For example, consider the following statement: > > SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), > (b), ( ) ); > > This statement is equivalent to: > > SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b > UNION > SELECT a, null, SUM( c ) FROM tab1 GROUP BY a > UNION > SELECT null, b, SUM( c ) FROM tab1 GROUP BY b > UNION > SELECT null, null, SUM( c ) FROM tab1 > > This is an example of real query: > > mysql> SELECT * FROM t; > +------+------+------+ > | k1 | k2 | k3 | > +------+------+------+ > | a | A | 1 | > | a | A | 2 | > | a | B | 1 | > | a | B | 3 | > | b | A | 1 | > | b | A | 4 | > | b | B | 1 | > | b | B | 5 | > +------+------+------+ > 8 rows in set (0.01 sec) > > mysql> SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1, > k2), (k2), (k1), ( ) ); > +------+------+-----------+ > | k1 | k2 | sum(`k3`) | > +------+------+-----------+ > | b | B | 6 | > | a | B | 4 | > | a | A | 3 | > | b | A | 5 | > | NULL | B | 10 | > | NULL | A | 8 | > | a | NULL | 7 | > | b | NULL | 11 | > | NULL | NULL | 18 | > +------+------+-----------+ > 9 rows in set (0.06 sec) > > GROUPING SETS Syntax > > SELECT ... > FROM ... > [ ... ] > GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] ) > [ ... ] > > groupSet ::= { ( expr [ , expr [ , ... ] ] )} > > <expr> > Expression,column name. > > > *ROLLUP clause* > > ROLLUP enables a SELECT statement to calculate multiple levels of subtotals > across a specified group of dimensions. It also calculates a grand total. > ROLLUP is a simple extension to the GROUP BY clause, so its syntax is > extremely easy to use. The ROLLUP extension is highly efficient, adding > minimal overhead to a query. > > ROLLUP appears in the GROUP BY clause in a SELECT statement. Its form is: > > ROLLUP ( e1, e2, e3, ... ) > > This statement is equivalent to GROUPING SETS as followed: > > GROUPING SETS ( > ( e1, e2, e3, ... ), > ... > ( e1, e2 ), > ( e1 ), > ( ) > ) > > ROLLUP Syntax > > SELECT ... > FROM ... > [ ... ] > GROUP BY ROLLUP ( expr [ , expr [ , ... ] ] ) > [ ... ] > > <expr> > Expression,column name. > > *CUBE clause* > > > Like ROLLUP CUBE generates all the subtotals that could be calculated for a > data cube with the specified dimensions. > > CUBE ( e1, e2, e3, ... ) > > e.g. CUBE ( a, b, c ) is equivalent to GROUPING SETS as followed: > > GROUPING SETS ( > ( a, b, c ), > ( a, b ), > ( a, c ), > ( a ), > ( b, c ), > ( b ), > ( c ), > ( ) > ) > > CUBE Syntax > > SELECT ... > FROM ... > [ ... ] > GROUP BY CUBE ( expr [ , expr [ , ... ] ] ) > [ ... ] > > <expr> > Expression,column name. > > GROUPING_ID() Function > > GROUPING_ID describes which of a list of expressions are grouped in a row > produced by a GROUP BY query. GROUPING_ID () returns an integer bitmap > whose lowest N bits may be lit. A lit bit indicates the corresponding > argument is not a grouping column for the given output row. The > lowest-order bit corresponds to argument N, and the N-1th lowest-order > bit corresponds > to argument 1. If the column is a grouping column the bit is 0 else is 1. > > For example: > > mysql> select * from t; > +------+------+------+ > | k1 | k2 | k3 | > +------+------+------+ > | a | A | 1 | > | a | A | 2 | > | a | B | 1 | > | a | B | 3 | > | b | A | 1 | > | b | A | 4 | > | b | B | 1 | > | b | B | 5 | > +------+------+------+ > > grouping sets result: > > mysql> SELECT k1, k2, GROUPING_ID(), SUM(k3) FROM t GROUP BY GROUPING > SETS ( (k1, k2), (k2), (k1), ( ) ); > +------+------+---------------+-----------+ > | k1 | k2 | grouping_id() | sum(`k3`) | > +------+------+---------------+-----------+ > | a | A | 0 | 3 | > | a | B | 0 | 4 | > | a | NULL | 1 | 7 | > | b | A | 0 | 5 | > | b | B | 0 | 6 | > | b | NULL | 1 | 11 | > | NULL | A | 2 | 8 | > | NULL | B | 2 | 10 | > | NULL | NULL | 3 | 18 | > +------+------+---------------+-----------+ > 9 rows in set (0.02 sec) > > > *Solution* > > For GROUPING SET is equivalent to the UNION of GROUP BY . So we can expand > input rows, and run an GROUP BY on these rows。 > > For example: > > SELECT a, b FROM src GROUP BY a, b GROUPING SETS ((a, b), (a), (b), ()); > > Data in table src : > > 1, 2 > 3, 4 > > Base on GROUPING SETS , we can expend the input to: > > 1, 2 (GROUPING_ID: a, b -> 00 -> 0) > 1, null (GROUPING_ID: a, null -> 01 -> 1) > null, 2 (GROUPING_ID: null, b -> 10 -> 2) > null, null (GROUPING_ID: null, null -> 11 -> 3) > > 3, 4 (GROUPING_ID: a, b -> 00 -> 0) > 3, null (GROUPING_ID: a, null -> 01 -> 1) > null, 4 (GROUPING_ID: null, b -> 10 -> 2) > null, null (GROUPING_ID: null, null -> 11 -> 3) > > And then use those row as input, then GROUP BY a, b, GROUPING_ID