pvary commented on code in PR #10859:
URL: https://github.com/apache/iceberg/pull/10859#discussion_r1705281270
##########
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java:
##########
@@ -233,15 +239,56 @@ public Builder flinkConf(ReadableConfig config) {
* @return {@link Builder} to connect the iceberg table.
*/
public Builder distributionMode(DistributionMode mode) {
- Preconditions.checkArgument(
- !DistributionMode.RANGE.equals(mode),
- "Flink does not support 'range' write distribution mode now.");
if (mode != null) {
writeOptions.put(FlinkWriteOptions.DISTRIBUTION_MODE.key(),
mode.modeName());
}
return this;
}
+ /**
+ * Range distribution needs to collect statistics about data distribution
to properly shuffle
+ * the records in relatively balanced way. In general, low cardinality
should use {@link
+ * StatisticsType#Map} and high cardinality should use {@link
StatisticsType#Sketch} Refer to
+ * {@link StatisticsType} Javadoc for more details.
+ *
+ * <p>Default is {@link StatisticsType#Auto} where initially Map
statistics is used. But if
+ * cardinality is higher than some threshold (like 10K), statistics
collection automatically
Review Comment:
This threshold is hard-coded at this moment? Do we want to link the constant
driving this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]