huyuanfeng2018 commented on code in PR #7494:
URL: https://github.com/apache/iceberg/pull/7494#discussion_r1184480444


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatistics.java:
##########
@@ -42,12 +43,19 @@
    *
    * @param key generate from data by applying key selector
    */
-  void add(K key);
+  void add(RowData key);

Review Comment:
   Is it necessary to add a method here to count statistical information with 
values? For example, data under different keys may have different sizes. Is it 
also a consideration when controlling subsequent balance,like:``` add(Rowdata 
key, V v)``` Among them, v may represent the record bytes of the row 
corresponding to the current key. What do you think?



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java:
##########
@@ -40,50 +40,49 @@
  * shuffle record to improve data clustering while maintaining relative 
balanced traffic
  * distribution to downstream subtasks.
  */
-class DataStatisticsOperator<T, K> extends 
AbstractStreamOperator<DataStatisticsOrRecord<T, K>>
-    implements OneInputStreamOperator<T, DataStatisticsOrRecord<T, K>>, 
OperatorEventHandler {
+class DataStatisticsOperator<D extends DataStatistics<D, S>, S>

Review Comment:
   Ok, I think this DataStatisticsOperator is good for collecting some 
statistical information. It seems that we also need an Operator to determine 
the partitionID, such as PartitionIdAssignerOperator and then pass 
```org.apache.flink.api.java.functions.IdPartitioner``` From custom data 
distribution, I haven’t seen the design of this piece in the design document, 
can you briefly introduce the follow-up implementation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to