stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633577198
########## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ########## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Serializable { private final long checkpointId; private final StatisticsType type; private final Map<SortKey, Long> keyFrequency; - private final SortKey[] rangeBounds; + private final SortKey[] keySamples; Review Comment: Yes, conceptually we have two types of objects here. for Map statistics, there is no difference btw global statistics and completed statistics, as there is no further reduction in stats size. For sketch statistics, global statistics is a lot smaller with range bounds. We can introduce two types `CompleteStatistics` and `GlobalStatistics`. We can also introduce a base type `AggregatedStatistics`. I am trying to avoid duplicate the `AggregatedStatisticsSerializer` as it can work for both types. Maybe generics can enable code reused and solve the most duplications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org