GitHub user avamingli created a discussion: Give UNION ALL more opportunities
for parallel plans in MPP.
### Description
In CBDB's PostgreSQL implementation, UNION ALL operations currently have four
processing methods:
three parallel variants (Parallel Append with partial subpaths, Parallel Append
with mixed partial/non-partial subpaths, and Append with partial subpaths) and
one non-parallel (Append with non-partial subpaths).
1. Parallel Append with partial subpaths
```sql
SELECT unique1 FROM tenk1 WHERE fivethous = tenthous + 1
UNION ALL
SELECT unique1 FROM tenk1 WHERE fivethous = tenthous + 1;
QUERY PLAN
----------------------------------------------------
Gather
Workers Planned: 4
-> Parallel Append
-> Parallel Seq Scan on tenk1
Filter: (fivethous = (tenthous + 1))
-> Parallel Seq Scan on tenk1 tenk1_1
Filter: (fivethous = (tenthous + 1))
(7 rows)
```
2. Parallel Append with mixed partial/non-partial subpaths
```sql
explain (costs off)
select round(avg(aa)), sum(aa) from a_star;
QUERY PLAN
--------------------------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 3
-> Partial Aggregate
-> Parallel Append
-> Seq Scan on d_star a_star_4
-> Seq Scan on c_star a_star_3
-> Parallel Seq Scan on f_star a_star_6
-> Parallel Seq Scan on e_star a_star_5
-> Parallel Seq Scan on b_star a_star_2
-> Parallel Seq Scan on a_star a_star_1
```
3. Append with partial subpaths
```sql
explain (costs off)
select round(avg(aa)), sum(aa) from a_star;
QUERY PLAN
--------------------------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 1
-> Partial Aggregate
-> Append
-> Parallel Seq Scan on a_star a_star_1
-> Parallel Seq Scan on b_star a_star_2
-> Parallel Seq Scan on c_star a_star_3
-> Parallel Seq Scan on d_star a_star_4
-> Parallel Seq Scan on e_star a_star_5
-> Parallel Seq Scan on f_star a_star_6
(11 rows)
```
4. Append with non-partial subpaths
```sql
select * from
(select *, 0 as x from int8_tbl a
union all
select *, 1 as x from int8_tbl b) ss
where (x = 0) or (q1 >= q2 and q1 <= q2);
QUERY PLAN
---------------------------------------------
Append
-> Seq Scan on int8_tbl a
-> Seq Scan on int8_tbl b
Filter: ((q1 >= q2) AND (q1 <= q2))
(4 rows)
```
While these work well for local queries, we've had to disable parallel
execution when Motion nodes appear in subpaths due to a critical correctness
issue. The problem occurs when Parallel Append workers mark subnodes as
completed, potentially causing premature skipping of Motion-containing
branches.
This limitation forces serial execution for most distributed table queries, as
shown in example plans where Gather Motion wraps a serial Append with
Redistribute Motion nodes in each branch, missing significant optimization
opportunities.
```sql
explain(costs off) select b, count(*) from t1 group by b union all
select b, count(*) from t2 group by b;
QUERY PLAN
------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
-> Append
-> HashAggregate
Group Key: t1.b
-> Redistribute Motion 3:3 (slice2; segments: 3)
Hash Key: t1.b
-> Seq Scan on t1
-> HashAggregate
Group Key: t2.b
-> Redistribute Motion 3:3 (slice3; segments: 3)
Hash Key: t2.b
-> Seq Scan on t2
Optimizer: Postgres query optimizer
(13 rows)
```
But, there is still a chance to be parallel.
I propose a robust solution that first attempts parallel-aware Append when safe
(no Motion nodes), then automatically falls back to parallel-oblivious Append
when Motion hazards are detected. This works because regular Append reliably
executes all subnodes regardless of Motion presence, while CBDB's Motion nodes
inherently handle tuples individually without requiring worker coordination.
The benefits extend beyond the UNION ALL operator itself - enabling this
parallelization creates ripple effects where subpaths gain more parallel
execution opportunities, particularly valuable for complex nested queries. This
optimization stands to significantly improve TPC-DS benchmark performance and
other analytical workloads involving distributed tables.
We're preparing to implement this with prototype development, TPC-DS testing,
and edge case validation. Community feedback is welcome on potential corner
cases, benchmarking approaches, and real-world query patterns that might
benefit most.
### Use case/motivation
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
GitHub link: https://github.com/apache/cloudberry/discussions/1291
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]