huyuanfeng2018 commented on code in PR #7494:
URL: https://github.com/apache/iceberg/pull/7494#discussion_r1184480444
##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatistics.java:
##########
@@ -42,12 +43,19 @@
*
* @param key generate from data by applying key selector
*/
- void add(K key);
+ void add(RowData key);
Review Comment:
Is it necessary to add a method here to count statistical information with
values? For example, data under different keys may have different sizes. Is it
also a consideration when controlling subsequent balance,like:``` add(Rowdata
key, V v)``` Among them, v may represent the record bytes of the row
corresponding to the current key. What do you think?
##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java:
##########
@@ -40,50 +40,49 @@
* shuffle record to improve data clustering while maintaining relative
balanced traffic
* distribution to downstream subtasks.
*/
-class DataStatisticsOperator<T, K> extends
AbstractStreamOperator<DataStatisticsOrRecord<T, K>>
- implements OneInputStreamOperator<T, DataStatisticsOrRecord<T, K>>,
OperatorEventHandler {
+class DataStatisticsOperator<D extends DataStatistics<D, S>, S>
Review Comment:
Ok, I think this DataStatisticsOperator is good for collecting some
statistical information. It seems that we also need an Operator to determine
the partitionID, such as PartitionIdAssignerOperator and then pass
```org.apache.flink.api.java.functions.IdPartitioner``` From custom data
distribution, I haven’t seen the design of this piece in the design document,
can you briefly introduce the follow-up implementation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]