somandal opened a new pull request, #10846:
URL: https://github.com/apache/pinot/pull/10846

   This work is a collaboration between @vvivekiyer and myself
   *Important Note:* This PR must be rebased on top of @vvivekiyer 's runtime 
changes (PR: https://github.com/apache/pinot/pull/10845) and integrated before 
it can be merged
   
   Motivation:
   
   - Today only a subset of aggregation functions (sum, min, max, count, avg, 
count(distinct), kurtosis, and skewness) work in the multistage engine.
   - To support each new aggregation function changes are required on both the 
planner and runtime side. Pinot contributors should not need to understand the 
nitty gritty details of the planner and execution engine to add new aggregation 
functions.
   - The multistage `AggregateOperator` essentially reimplements the 
aggregation logic rather than reusing the aggregation logic present in the v1 
engine functions. This results in code duplication and a high maintenance 
overhead (e.g. bug fixes must be made in two places which may even have 
different logic or different mechanisms to support things like null handling).
   - The v1 execution engine supports a large number of aggregation functions 
and it is not practical to manually add support for each new aggregation 
function to multistage. This also results in a situation in which each new 
aggregation function needs to be added to both the v1 engine and the multistage 
engine, or if support is only added for the v1 engine, the multistage engine 
must constantly play catch up.
   
   To get around the above limitations we are refactoring the aggregation 
function framework to work for both the v1 and multistage engine. This PR 
tackles the changes required on the planner side to make the planner more 
generic. This is phase 1 of the planned set of changed to support the 
aggregation functions and tackles only the existing set of aggregation 
functions already supported in the multistage engine.
   
   - OSS Issue: https://github.com/apache/pinot/issues/10745
   - Design document: 
https://docs.google.com/document/d/1Us6aBvTpNLMEy0ODo34OgTk73h_LVFFAH6q17689h1M/edit?usp=sharing
   - @vvivekiyer 's runtime changes to support this: 
https://github.com/apache/pinot/pull/10845
   
   Types of functions this PR does *not* support (and for which support will be 
added later on):
   
   - Add support for taking function arguments as literals (e.g. PERCENTILE)
   - Add support for aggregation functions which take more than one column as 
arguments (e.g . COVARIANCE)
   - Arg min/max type aggregation functions using the Parent-Child relationship


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to