kevinjqliu commented on PR #1232:
URL: https://github.com/apache/iceberg-python/pull/1232#issuecomment-2424118781

   Hi @mths1, Thanks for the feedback. You're right, 
`write.target-file-size-bytes` does not represent the resulting file's size on 
disk. It's based on the size of the in-memory arrow buffers and since parquet 
can be compressed, the resulting file size can be smaller. 
   
   This aligns with 
https://iceberg.apache.org/docs/latest/spark-writes/#controlling-file-sizes
   
   Perhaps we can mention this behavior in the table. For example, this is what 
the [java 
docs](https://iceberg.apache.org/docs/latest/configuration/#write-properties) 
mention
   ```
   write.target-file-size-bytes | 536870912 (512 MB) | Controls the size of 
files generated to target about this many bytes
   ```
   
   Maybe something like 
   ```
   Controls the target size of in-memory buffers for writing files. The actual 
file size may be smaller due to compression.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to