nathanb9 commented on PR #23175:
URL: https://github.com/apache/datafusion/pull/23175#issuecomment-4806406668
# Query / Computation Fusion Optimizer Ideas
Inspired by Athena's computation-reuse fusion paper.
## 1. Scalar Aggregate Fusion
Merge multiple scalar aggregate subqueries over the same source into one
aggregate with `FILTER` clauses.
Example query: TPC-DS Q9
Before:
```sql
SELECT
(SELECT avg(x) FROM R WHERE p1),
(SELECT avg(y) FROM R WHERE p2);
```
After:
```sql
SELECT
avg(x) FILTER (WHERE p1),
avg(y) FILTER (WHERE p2)
FROM R;
```
## 2. Grouped Aggregate Join To Window Aggregate
Replace a join back to a grouped aggregate over the same source with a
window aggregate.
Example query: TPC-DS Q65
Before:
```sql
SELECT R.*, A.avg_v
FROM R
JOIN (
SELECT k, avg(v) AS avg_v
FROM R
GROUP BY k
) A
ON R.k = A.k;
```
After:
```sql
SELECT
R.*,
avg(v) OVER (PARTITION BY k) AS avg_v
FROM R
WHERE k IS NOT NULL;
```
## 3. UNION ALL Branch Fusion
Merge same-source `UNION ALL` branches into one scan plus branch tags and
branch-specific predicates.
Example query: TPC-DS Q23-style shapes
Before:
```sql
SELECT ... FROM R WHERE p1
UNION ALL
SELECT ... FROM R WHERE p2;
```
After:
```sql
SELECT ...
FROM R
CROSS JOIN branches
WHERE (branch = 1 AND p1)
OR (branch = 2 AND p2);
```
## 4. Multi-Aggregate Common-Source Fusion
Merge sibling aggregates over the same source and group keys into one
aggregate with multiple aggregate expressions.
Example queries: TPC-DS Q28 / Q88
Before:
```sql
SELECT ...
FROM (
SELECT k, agg1(v) FROM R GROUP BY k
) a,
(
SELECT k, agg2(v) FROM R GROUP BY k
) b;
```
After:
```sql
SELECT
k,
agg1(v),
agg2(v)
FROM R
GROUP BY k;
```
Note: Q28-style cases may need `DISTINCT` / `MarkDistinct` handling.
## 5. Common Join-Input Fusion
Detect joins or subqueries that repeatedly consume the same source and share
the common input instead of recomputing it.
Example queries: TPC-DS Q64 / Q88
Before:
```sql
SELECT ...
FROM (R JOIN A) x,
(R JOIN B) y;
```
After:
```sql
SELECT ...
FROM R
JOIN A
JOIN B;
```
This should start conservatively behind a config flag and require semantic
guards for `NULL`s, `DISTINCT`, outer joins, limits, ordering, and volatile
expressions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]