kevinjqliu commented on PR #1232: URL: https://github.com/apache/iceberg-python/pull/1232#issuecomment-2424118781
Hi @mths1, Thanks for the feedback. You're right, `write.target-file-size-bytes` does not represent the resulting file's size on disk. It's based on the size of the in-memory arrow buffers and since parquet can be compressed, the resulting file size can be smaller. This aligns with https://iceberg.apache.org/docs/latest/spark-writes/#controlling-file-sizes Perhaps we can mention this behavior in the table. For example, this is what the [java docs](https://iceberg.apache.org/docs/latest/configuration/#write-properties) mention ``` write.target-file-size-bytes | 536870912 (512 MB) | Controls the size of files generated to target about this many bytes ``` Maybe something like ``` Controls the target size of in-memory buffers for writing files. The actual file size may be smaller due to compression. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org