pvary commented on code in PR #10859:
URL: https://github.com/apache/iceberg/pull/10859#discussion_r1705247759


##########
docs/docs/flink-writes.md:
##########
@@ -262,6 +262,91 @@ INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') 
*/
 
 Check out all the options here: 
[write-options](flink-configuration.md#write-options) 
 
+## Distribution mode
+
+Flink streaming writer supports both `HASH` and `RANGE` distribution mode.
+You can enable it via `FlinkSink#Builder#distributionMode(DistributionMode )`
+or via [write-options](flink-configuration.md#write-options).
+
+### Hash distribution
+
+HASH distribution shuffle data by partition key (partitioned table) or
+equality fields (non-partitioned table). It simply leverages Flink's
+`DataStream#keyBy` to distribute the data.
+
+HASH distribution has a few limitations.
+<ul>
+<li>It doesn't handle skewed data well. E.g. some partitions have a lot more 
data than others.
+<li>It can result in unbalanced traffic distribution if cardinality of the 
partition key or
+equality fields is low as demonstrated by [PR 
4228](https://github.com/apache/iceberg/pull/4228).
+<li>Writer parallelism is limited to the cardinality of the hash key.
+if the cardinality is 10, only at most 10 writer tasks would get the traffic.

Review Comment:
   nit: capital letter in `If`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to