stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2350416841
@binshuohu Currently, there is no plan to reapply this change to the main branch. We have a more general range distribution available now (guided by statistics collection): https://iceberg.apache.org/docs/nightly/flink-writes/#range-distribution-experimental. It is more general than this (bucketing only). Range distribution also handle different parallelisms and partitions well. Range distribution has one disadvantage. It performs statistics collection and aggregation to guide the range split. That adds a little overhead. Bucketing partitioner here assumes traffic are evenly distributed across buckets, which should be true (hash % nBuckets). cc @pvary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org