RussellSpitzer commented on issue #8729:
URL: https://github.com/apache/iceberg/issues/8729#issuecomment-1751713732

   Everything Amogh said is correct, write target file size is the max a writer 
will produce not the minimum. Amount of data written to a file is dependent on 
the amount of data in the Spark Task. This is controlled by advisory partition 
size if you are using hash or range distributions, if you are not using any 
write distribution it is just equal to the size of the spark tasks.
   
   As for your questions,
   
   1. It's whatever the shuffle engine things the size of the spark serialized 
rows are.
   2. Yes
   
   BUT these only apply if a shuffle is happening before write which only 
happens if write distribution mode is hash or range. Iceberg has no additional 
coalescing rules.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to