stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1762592732

   @chenwyi2 Is your point that we shouldn't only consider bucketing column 
(like did in this PR). you just want a plain keyBy in this case? that would be 
a fair point. Do you get balanced traffic distribution among write tasks with 
simple keyBy?
   
   I am also wondering if the partition spec of dt,hour,minute and bucekt(id) 
is the best option. especially the minute column as partition. do you really 
need minute level partition granularity.  you are creating very fine grained 
partitions. even with the most optimal data distribution/shuffle. there are 
still a lot of partitions and data files. 
   
   you used 8 for bucket number. it seems quite small for bucketing. what's the 
reason of using 8 buckets?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to