Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

via GitHub Thu, 01 Feb 2024 11:55:25 -0800


Fokko commented on issue #30:
URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-1922123177


   Just for context, don't know if it helps. I was recently playing by pushing 
the union of the tables into Arrow, including all the schema evolution. This 
would prevent PyIceberg from doing this 
[itself](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1124-L1133)
 which is slow. The idea was to create an empty table with the requested 
schema. And then union all the parquet files to it. With the 
[new](https://github.com/apache/arrow/pull/36846) option in concat table to 
automatically do schema evolution. The missing part there is that Arrow [cannot 
re-order](https://github.com/apache/arrow/issues/38615) struct fields :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

Reply via email to