stevenzwu commented on code in PR #10457:
URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633577198


##########
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java:
##########
@@ -35,28 +35,28 @@ class AggregatedStatistics implements Serializable {
   private final long checkpointId;
   private final StatisticsType type;
   private final Map<SortKey, Long> keyFrequency;
-  private final SortKey[] rangeBounds;
+  private final SortKey[] keySamples;

Review Comment:
   Yes, conceptually we have two types of objects here. for Map statistics, 
there is no difference btw global statistics and completed statistics, as there 
is no further reduction in stats size. For sketch statistics, global statistics 
is a lot smaller with range bounds.
   
   We can introduce two types `CompleteStatistics` and `GlobalStatistics`. We 
can also introduce a base type `AggregatedStatistics`. I am trying to avoid 
duplicate the `AggregatedStatisticsSerializer` as it can work for both types. 
Maybe generics can enable code reused and solve the most duplications.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to