GitHub user avamingli edited a discussion: Make UNION Parallel
### Description
In PostgreSQL, the UNION operator can leverage parallel processing through the
Parallel Append node with subnodes.
```sql
explain (costs off, verbose) select * from t1 union select * from t2;
QUERY PLAN
--------------------------------------------------
HashAggregate
Output: t1.a, t1.b
Group Key: t1.a, t1.b
-> Gather
Output: t1.a, t1.b
Workers Planned: 3
-> Parallel Append
-> Parallel Seq Scan on public.t1
Output: t1.a, t1.b
-> Parallel Seq Scan on public.t2
Output: t2.a, t2.b
(11 rows)
```
However, in CBDB, we face challenges when UNION subqueries include Motion
nodes, which can lead to incorrect results if Parallel Append is used. This is
due to the competition among workers for subnodes: some subnodes are executed
by a single worker while others may be processed by multiple workers.
When a worker completes a task for a multi-worker subnode, it marks that job as
finished.
While this is acceptable in PostgreSQL, but in CBDB, the presence of Motion
nodes complicates this process. The premature marking of completion can cause
the Motion Sender to stop early, resulting in data loss for other workers.
For cases involving only Scan nodes—such as Parallel Scans on partitioned
tables—this issue does not arise.
However, for UNION ALL and similar scenarios, the use of Parallel Append is
unsafe. That's another problem and we should consider disabling it in the
future.
Despite these challenges, as an MPP database, CBDB has the potential to support
parallel processing for UNION operations. This can be achieved if multiple
workers execute Append nodes within a slice and the data is well-distributed
(hashed across multiple segments relative to the cluster) according to the
output columns of the subqueries.
Specifically, we can implement a Parallel-oblivious Append approach, where
multiple workers operate independently without sharing state, rather than a
Parallel-aware Append that requires coordination among workers.
### Use case/motivation
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
GitHub link: https://github.com/apache/cloudberry/discussions/1202
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]