??????[Proposal] Enhanced Aggregation, Support Grouping Sets, Rollup andCube to extend group by statement

???????? Thu, 24 Oct 2019 02:01:41 -0700

useful feature.


But I think the grammar of Hive may be a reference, it is a bit different from 
your case:
SELECT a, b, SUM(c) FROM tab1 GROUP BY a, b GROUPING SETS ( (a,b) )


ref: 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup




------------------ ???????? ------------------
??????: "??????"<[email protected]>;
????????: 2019??10??24??(??????) ????11:30
??????: "dev"<[email protected]>;

????: [Proposal] Enhanced Aggregation, Support Grouping Sets, Rollup andCube to 
extend group by statement



The CUBE, ROLLUP, and GROUPING SETS  extensions to SQL make querying and
reporting easier and faster. CUBE, ROLLUP, and grouping sets produce a
single result set that is equivalent to a UNION ALL of differently grouped
rows.

The three GROUPING functions  make aggregate calculations more efficient,
thereby enhancing database performance, and scalability.

GROUPING SETS clause

GROUPING SETS syntax lets you define multiple groupings in the same
query. GROUP
BY computes all the groupings specified and combines them with UNION ALL.
For example, consider the following statement:

SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a),
(b), ( ) );

This statement is equivalent to:

SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
UNION
SELECT a, null, SUM( c ) FROM tab1 GROUP BY a
UNION
SELECT null, b, SUM( c ) FROM tab1 GROUP BY b
UNION
SELECT null, null, SUM( c ) FROM tab1

This is an example of real query??

mysql> SELECT * FROM t;
+------+------+------+
| k1   | k2   | k3   |
+------+------+------+
| a    | A    |    1 |
| a    | A    |    2 |
| a    | B    |    1 |
| a    | B    |    3 |
| b    | A    |    1 |
| b    | A    |    4 |
| b    | B    |    1 |
| b    | B    |    5 |
+------+------+------+
8 rows in set (0.01 sec)

mysql> SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ( (k1,
k2), (k2), (k1), ( ) );
+------+------+-----------+
| k1   | k2   | sum(`k3`) |
+------+------+-----------+
| b    | B    |         6 |
| a    | B    |         4 |
| a    | A    |         3 |
| b    | A    |         5 |
| NULL | B    |        10 |
| NULL | A    |         8 |
| a    | NULL |         7 |
| b    | NULL |        11 |
| NULL | NULL |        18 |
+------+------+-----------+
9 rows in set (0.06 sec)

GROUPING SETS Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
[ ... ]

groupSet ::= { ( expr  [ , expr [ , ... ] ] )}

<expr>
Expression??column name.


*ROLLUP clause*

ROLLUP enables a SELECT statement to calculate multiple levels of subtotals
across a specified group of dimensions. It also calculates a grand total.
ROLLUP is a simple extension to the GROUP BY clause, so its syntax is
extremely easy to use. The ROLLUP extension is highly efficient, adding
minimal overhead to a query.

ROLLUP appears in the GROUP BY clause in a SELECT statement. Its form is:

ROLLUP ( e1, e2, e3, ... )

This statement is equivalent to GROUPING SETS as followed??

GROUPING SETS (
( e1, e2, e3, ... ),
...
( e1, e2 ),
( e1 ),
( )
)

 ROLLUP Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY ROLLUP ( expr  [ , expr [ , ... ] ] )
[ ... ]

<expr>
Expression??column name.

*CUBE clause*


Like ROLLUP CUBE generates all the subtotals that could be calculated for a
data cube with the specified dimensions.

CUBE ( e1, e2, e3, ... )

e.g. CUBE ( a, b, c ) is equivalent to GROUPING SETS as followed:

GROUPING SETS (
( a, b, c ),
( a, b ),
( a,    c ),
( a       ),
(    b, c ),
(    b    ),
(       c ),
(         )
)

CUBE Syntax

SELECT ...
FROM ...
[ ... ]
GROUP BY CUBE ( expr  [ , expr [ , ... ] ] )
[ ... ]

<expr>
Expression??column name.

GROUPING_ID() Function

GROUPING_ID describes which of a list of expressions are grouped in a row
produced by a GROUP BY query. GROUPING_ID () returns an integer bitmap
whose lowest N bits may be lit. A lit bit indicates the corresponding
argument is not a grouping column for the given output row. The
lowest-order bit corresponds to argument N, and the N-1th lowest-order
bit corresponds
to argument 1. If the column is a grouping column the bit is 0 else is 1.

For example??

mysql> select * from t;
+------+------+------+
| k1   | k2   | k3   |
+------+------+------+
| a    | A    |    1 |
| a    | A    |    2 |
| a    | B    |    1 |
| a    | B    |    3 |
| b    | A    |    1 |
| b    | A    |    4 |
| b    | B    |    1 |
| b    | B    |    5 |
+------+------+------+

grouping sets result??

mysql> SELECT k1, k2, GROUPING_ID(), SUM(k3) FROM t GROUP BY GROUPING
SETS ( (k1, k2), (k2), (k1), ( ) );
+------+------+---------------+-----------+
| k1   | k2   | grouping_id() | sum(`k3`) |
+------+------+---------------+-----------+
| a    | A    |             0 |         3 |
| a    | B    |             0 |         4 |
| a    | NULL |             1 |         7 |
| b    | A    |             0 |         5 |
| b    | B    |             0 |         6 |
| b    | NULL |             1 |        11 |
| NULL | A    |             2 |         8 |
| NULL | B    |             2 |        10 |
| NULL | NULL |             3 |        18 |
+------+------+---------------+-----------+
9 rows in set (0.02 sec)


*Solution*

For GROUPING SET is equivalent to the UNION of GROUP BY . So we can expand
input rows, and run an GROUP BY on these rows??

For example??

SELECT a, b FROM src GROUP BY a, b GROUPING SETS ((a, b), (a), (b), ());

Data in table src :

1, 2
3, 4

Base on GROUPING SETS , we can expend the input to??

1, 2       (GROUPING_ID: a, b -> 00 -> 0)
1, null    (GROUPING_ID: a, null -> 01 -> 1)
null, 2    (GROUPING_ID: null, b -> 10 -> 2)
null, null (GROUPING_ID: null, null -> 11 -> 3)

3, 4       (GROUPING_ID: a, b -> 00 -> 0)
3, null    (GROUPING_ID: a, null -> 01 -> 1)
null, 4    (GROUPING_ID: null, b -> 10 -> 2)
null, null (GROUPING_ID: null, null -> 11 -> 3)

And then use those row as input, then GROUP BY a, b, GROUPING_ID

??????[Proposal] Enhanced Aggregation, Support Grouping Sets, Rollup andCube to extend group by statement

Reply via email to