sergun commented on issue #601: URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2054129400
Thank you @kevinjqliu ! Do you know how to read parquet file with unified schema in pyarrow? I successfully merged schemas: ``` t1 = pq.read_table("data/1.parquet") t2 = pq.read_table("data/2.parquet") schema = pa.unify_schemas([t1.schema, t2.schema]) print(schema) ``` but the next lines give an error: ``` t1 = pq.read_table("data/1.parquet", schema=schema) t2 = pq.read_table("data/2.parquet", schema=schema) # union of t1 and t2 and write to iceberg should follow ``` ``` pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong order: Input fields: struct<z: int64, x: int64> output fields: struct<z: int64, x: int64, y: int64, w: struct<w1: int64>> ``` Reg. duckdb - unfortunately union by name does not work for nested parquet files with changes in schemas on any level of nested structures. BTW it works for json in duckdb. It is my question in duckdb discussion: https://github.com/duckdb/duckdb/discussions/11633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org