kevinjqliu commented on PR #829: URL: https://github.com/apache/iceberg-python/pull/829#issuecomment-2215827324
> First of all, sorry for the late reply. Feel free to ping me more aggressively. No worries at all, I forgot to ping about this PR > How about re-aligning the table before we write, otherwise we have to do all of this when reading. Most tables have far fewer writes than reads, so it is good to optimize for reads. Can you talk a bit more about "re-aligning"? Is it to match the parquet schema with that of Iceberg's? I see that `to_requested_schema` is currently used to coerce the data before it is written to parquet. https://github.com/apache/iceberg-python/blame/7dff359e0515839fbe24fac2108dcb2d64694b7a/pyiceberg/io/pyarrow.py#L1915-L1918 Is the idea to do so for the entire arrow table before writing? If so, maybe we can push the `to_requested_schema` up the stack and simplify `write_parquet`. I also mentioned this in https://github.com/apache/iceberg-python/pull/786#discussion_r1646417180 @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org