somandal opened a new pull request, #10846: URL: https://github.com/apache/pinot/pull/10846
This work is a collaboration between @vvivekiyer and myself *Important Note:* This PR must be rebased on top of @vvivekiyer 's runtime changes (PR: https://github.com/apache/pinot/pull/10845) and integrated before it can be merged Motivation: - Today only a subset of aggregation functions (sum, min, max, count, avg, count(distinct), kurtosis, and skewness) work in the multistage engine. - To support each new aggregation function changes are required on both the planner and runtime side. Pinot contributors should not need to understand the nitty gritty details of the planner and execution engine to add new aggregation functions. - The multistage `AggregateOperator` essentially reimplements the aggregation logic rather than reusing the aggregation logic present in the v1 engine functions. This results in code duplication and a high maintenance overhead (e.g. bug fixes must be made in two places which may even have different logic or different mechanisms to support things like null handling). - The v1 execution engine supports a large number of aggregation functions and it is not practical to manually add support for each new aggregation function to multistage. This also results in a situation in which each new aggregation function needs to be added to both the v1 engine and the multistage engine, or if support is only added for the v1 engine, the multistage engine must constantly play catch up. To get around the above limitations we are refactoring the aggregation function framework to work for both the v1 and multistage engine. This PR tackles the changes required on the planner side to make the planner more generic. This is phase 1 of the planned set of changed to support the aggregation functions and tackles only the existing set of aggregation functions already supported in the multistage engine. - OSS Issue: https://github.com/apache/pinot/issues/10745 - Design document: https://docs.google.com/document/d/1Us6aBvTpNLMEy0ODo34OgTk73h_LVFFAH6q17689h1M/edit?usp=sharing - @vvivekiyer 's runtime changes to support this: https://github.com/apache/pinot/pull/10845 Types of functions this PR does *not* support (and for which support will be added later on): - Add support for taking function arguments as literals (e.g. PERCENTILE) - Add support for aggregation functions which take more than one column as arguments (e.g . COVARIANCE) - Arg min/max type aggregation functions using the Parent-Child relationship -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org