yashmayya opened a new pull request, #16515:
URL: https://github.com/apache/pinot/pull/16515

   - This rule tends to negatively affect performance because it can cause a 
larger number of rows to be shuffled across the network.
   - The rule is optimizing for reducing the total amount of work done by 
merging the multiple aggregates into a single one. However, for a distributed 
database like Pinot, the cost of data shuffle usually tends to dominate in 
larger clusters and workloads.
   - Pushing the distinct / empty aggregate into the leaf stage and not pulling 
it up to be combined with the empty aggregate above the `UNION ALL` (`UNION` in 
Pinot is always converted into `UNION ALL` with an empty grouping aggregate on 
top to eliminate duplicates) can significantly reduce the amount of data 
shuffled.
   - The rule can still be selectively enabled using the query option 
`usePlannerRules` which could be useful if the cardinality is similar to the 
total count.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to