Re: [PR] Flink: put everything together for range distribution in Flink sink [iceberg]

via GitHub Tue, 06 Aug 2024 03:14:20 -0700


pvary commented on code in PR #10859:
URL: https://github.com/apache/iceberg/pull/10859#discussion_r1705281270



##########
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java:
##########
@@ -233,15 +239,56 @@ public Builder flinkConf(ReadableConfig config) {
      * @return {@link Builder} to connect the iceberg table.
      */
     public Builder distributionMode(DistributionMode mode) {
-      Preconditions.checkArgument(
-          !DistributionMode.RANGE.equals(mode),
-          "Flink does not support 'range' write distribution mode now.");
       if (mode != null) {
         writeOptions.put(FlinkWriteOptions.DISTRIBUTION_MODE.key(), 
mode.modeName());
       }
       return this;
     }
 
+    /**
+     * Range distribution needs to collect statistics about data distribution 
to properly shuffle
+     * the records in relatively balanced way. In general, low cardinality 
should use {@link
+     * StatisticsType#Map} and high cardinality should use {@link 
StatisticsType#Sketch} Refer to
+     * {@link StatisticsType} Javadoc for more details.
+     *
+     * <p>Default is {@link StatisticsType#Auto} where initially Map 
statistics is used. But if
+     * cardinality is higher than some threshold (like 10K), statistics 
collection automatically

Review Comment:
   This threshold is hard-coded at this moment? Do we want to link the constant 
driving this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: put everything together for range distribution in Flink sink [iceberg]

Reply via email to