Fokko commented on issue #30:
URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-1922123177

   Just for context, don't know if it helps. I was recently playing by pushing 
the union of the tables into Arrow, including all the schema evolution. This 
would prevent PyIceberg from doing this 
[itself](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1124-L1133)
 which is slow. The idea was to create an empty table with the requested 
schema. And then union all the parquet files to it. With the 
[new](https://github.com/apache/arrow/pull/36846) option in concat table to 
automatically do schema evolution. The missing part there is that Arrow [cannot 
re-order](https://github.com/apache/arrow/issues/38615) struct fields :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to