Stamatis Zampetakis created CALCITE-7446:
--------------------------------------------

             Summary: Add rule to pushdown conditions from filtered aggregates
                 Key: CALCITE-7446
                 URL: https://issues.apache.org/jira/browse/CALCITE-7446
             Project: Calcite
          Issue Type: Improvement
          Components: core
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Add a new rule, AggregateExtractFilterRule, to extract and pushdown conditions 
from filtered aggregates.

In SQL syntax, the transformation on a very simple query is outlined below.

+Before+
{code:sql}
SELECT
  SUM(sal) FILTER (WHERE deptno = 10),
  SUM(sal) FILTER (WHERE deptno = 20),
FROM emp
{code}
+After+
{code:sql}
SELECT
  SUM(sal) FILTER (WHERE deptno = 10),
  SUM(sal) FILTER (WHERE deptno = 20),
FROM emp
WHERE deptno = 10 OR deptno = 20
{code}
For the transformation to be valid all aggregate functions must have a filter 
condition.
The initial version of the rule will only deal with grand totals (no GROUP BY).

At first the additional filter in the WHERE clause may appear somewhat 
redundant but in practice it can help to greatly reduce the rows that reach the 
aggregation functions especially if the table is large. In combination with 
other rules/techniques (such as index selection, partition pruning, filter 
pushdown) it can make a big different in performance.

Below a slightly more complex example showcasing how the new rule combined with 
the existing JOIN pushdown rules can lead to a better plan.
{code:sql}
SELECT
  SUM(e.sal) FILTER (WHERE d.name = 'SALES'),
  SUM(e.sal) FILTER (WHERE d.name = 'RESEARCH')
FROM emp e JOIN dept d ON e.deptno = d.deptno
{code}
Currently, the aggregate processes all rows that come from the join of {{emp}} 
and {{dept}} tables. However, the aggregate will throw away all rows that don't 
involve the SALES and RESEARCH department so we can pushdown these conditions 
below the join and just above the scan of the {{dept}} table.

+Before+
{noformat}
LogicalAggregate(group=[{}], EXPR$0=[SUM($0) FILTER $1], EXPR$1=[SUM($0) FILTER 
$2])
  LogicalProject(SAL=[$5], $f1=[=($10, 'SALES')], $f2=[=($10, 'RESEARCH')])
    LogicalJoin(condition=[=($7, $9)], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
+After+
{noformat}
LogicalAggregate(group=[{}], EXPR$0=[SUM($0) FILTER $1], EXPR$1=[SUM($0) FILTER 
$2])
  LogicalProject(SAL=[$5], $f1=[=($10, 'SALES')], $f2=[=($10, 'RESEARCH')])
    LogicalJoin(condition=[=($7, $9)], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalFilter(condition=[SEARCH($1, Sarg['RESEARCH':VARCHAR(10), 
'SALES':VARCHAR(10)]:VARCHAR(10))])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
The JOIN example above is purely for demonstration purposes. The new rule is 
not gonna have any specific logic for handling join or other operators. The 
pushdown of the filter below the join (if possible) will be handled by the 
existing rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to