huaxingao commented on code in PR #6622: URL: https://github.com/apache/iceberg/pull/6622#discussion_r1095044926
########## api/src/main/java/org/apache/iceberg/expressions/BoundAggregate.java: ########## @@ -44,4 +57,85 @@ public Type type() { return term().type(); } } + + public String columnName() { + if (op() == Operation.COUNT_STAR) { + return "*"; + } else { + return ref().name(); + } + } + + public String describe() { + switch (op()) { + case COUNT_STAR: + return "count(*)"; + case COUNT: + return "count(" + ExpressionUtil.describe(term()) + ")"; + case MAX: + return "max(" + ExpressionUtil.describe(term()) + ")"; + case MIN: + return "min(" + ExpressionUtil.describe(term()) + ")"; + default: + throw new UnsupportedOperationException("Unsupported aggregate type: " + op()); + } + } + + <V> V safeGet(Map<Integer, V> map, int key) { + return safeGet(map, key, null); + } + + <V> V safeGet(Map<Integer, V> map, int key, V defaultValue) { + if (map != null) { + return map.getOrDefault(key, defaultValue); + } + + return null; + } + + interface Aggregator<R> { + void update(StructLike struct); + + void update(DataFile file); + + R result(); + } + + abstract static class NullSafeAggregator<T, R> implements Aggregator<R> { + private final BoundAggregate<T, R> aggregate; + private boolean isNull = false; + + NullSafeAggregator(BoundAggregate<T, R> aggregate) { + this.aggregate = aggregate; + } + + protected abstract void update(R value); + + protected abstract R current(); + + @Override + public void update(StructLike struct) { + if (!isNull) { + R value = aggregate.eval(struct); + update(value); + } + } + + @Override + public void update(DataFile file) { + if (!isNull) { + R value = aggregate.eval(file); + update(value); Review Comment: If one data file evaluates `null`, I think we still want to evaluate the rest of the data files. For example, ``` CREATE TABLE test (id LONG, data INT) USING iceberg PARTITIONED BY (id); INSERT INTO TABLE test VALUES (1, null), (1, null), (2, 33), (2, 44), (3, 55), (3, 66); SELECT max(data) FROM test; ``` For `max(data)`, the first data file evaluates null, I think we still want to evaluate the rest of the data files to get the max value `66` for `max(data)`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org