rdblue commented on PR #6405:
URL: https://github.com/apache/iceberg/pull/6405#issuecomment-1345703757

   @huaxingao, I was looking at #6252 and I wanted to try out implementing 
aggregation in either the core or API modules so that the majority of the logic 
could be shared rather than needing to implement it in every processing engine.
   
   Could you please take a look at this and see if it seems reasonable?
   
   The basic idea is to use `BoundAggregate` to do two things:
   1. Extract a value to aggregate in `eval(StructLike)` or `eval(DataFile)`, 
which is similar to how `eval` is used for other expressions
   2. Create an `Aggregator` that keeps track of the aggregate state
   
   Then this also adds `AggregateEvaluator` that operates on a list of 
aggregate expressions
   * `aggEval = AggregateEvaluator.create(tableSchema, expressions)` binds the 
expressions and creates aggregators for each one
   * `aggEval.update(StructLike)` and `aggEval.update(DataFile)` updates each 
expression aggregator
   * `aggEval.result()` returns a `StructLike` with the aggregated values
   * `aggEval.resultType()` returns a `StructType` for the aggregated values
   
   This is based on #6252, but tries to keep as much logic as possible in 
core/API. What do you think? Could we incorporate this into #6252?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to