yashmayya opened a new pull request, #16515: URL: https://github.com/apache/pinot/pull/16515
- This rule tends to negatively affect performance because it can cause a larger number of rows to be shuffled across the network. - The rule is optimizing for reducing the total amount of work done by merging the multiple aggregates into a single one. However, for a distributed database like Pinot, the cost of data shuffle usually tends to dominate in larger clusters and workloads. - Pushing the distinct / empty aggregate into the leaf stage and not pulling it up to be combined with the empty aggregate above the `UNION ALL` (`UNION` in Pinot is always converted into `UNION ALL` with an empty grouping aggregate on top to eliminate duplicates) can significantly reduce the amount of data shuffled. - The rule can still be selectively enabled using the query option `usePlannerRules` which could be useful if the cardinality is similar to the total count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
