GitHub user avamingli created a discussion: [Ideas] Switch to Streaming Hashagg for multiple phase aggregation in postgres planner.
### Description Currently, the Postgres planner selectively employs streaming aggregation in specific scenarios, such as DISTINCT operations or when parallelism is involved. However, this capability is not generically available for other multi-phase aggregation plans. I've been looking at our TPC-DS results and the performance difference with Orca. Recent benchmarking against the Orca optimizer using TPC-DS has revealed a consistent performance gap: even when the final plans are very similar, Orca tends to be faster. The root cause has been identified: Orca, by design, defaults to using Streaming Hash Aggregation for multi-phase aggregations, while the Postgres planner does not. **Analysis of Streaming vs. Non-Streaming Aggregation** The choice between streaming and non-streaming (hash) aggregation involves a trade-off: * Benefits of Streaming: It can push aggregation computations higher up the plan tree, which often enhances parallelism and reduces the need for heavy data materialization. * Drawbacks of Streaming: It may convert potential disk I/O pressure into network pressure, as grouped data is streamed between nodes. It's hard for the planner to weigh this precisely because it depends on data distribution and duplication. But the empirical data from our benchmarks is clear: streaming is a net win overall. In the best-case scenario—where data per segment is largely unique—the non-streaming mode may force a full spill of all data to disk, while the streaming mode incurs almost no additional overhead. So, I propose we switch the Postgres planner to use Streaming Hash Aggregation for these cases to match Orca's approach and get that performance boost. ### Use case/motivation _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! GitHub link: https://github.com/apache/cloudberry/discussions/1411 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
