Fokko commented on issue #30: URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-1922123177
Just for context, don't know if it helps. I was recently playing by pushing the union of the tables into Arrow, including all the schema evolution. This would prevent PyIceberg from doing this [itself](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1124-L1133) which is slow. The idea was to create an empty table with the requested schema. And then union all the parquet files to it. With the [new](https://github.com/apache/arrow/pull/36846) option in concat table to automatically do schema evolution. The missing part there is that Arrow [cannot re-order](https://github.com/apache/arrow/issues/38615) struct fields :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org