Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

via GitHub Sun, 14 Apr 2024 10:33:55 -0700


sergun commented on issue #601:
URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2054129400


   Thank you @kevinjqliu  !
   Do you know how to read parquet file with unified schema in pyarrow?
   
   I successfully merged schemas:
   ```
       t1 = pq.read_table("data/1.parquet")
       t2 = pq.read_table("data/2.parquet")
       schema = pa.unify_schemas([t1.schema, t2.schema])
       print(schema)
   ```
   but the next lines give an error:
   ```
   t1 = pq.read_table("data/1.parquet", schema=schema)
   t2 = pq.read_table("data/2.parquet", schema=schema)
   # union of t1 and t2 and write to iceberg should follow
   ```
   
   ```
   pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong 
order: Input fields: struct<z: int64, x: int64> output fields: struct<z: int64, 
x: int64, y: int64, w: struct<w1: int64>>
   ```
   
   Reg. duckdb - unfortunately union by name does not work for nested parquet 
files with changes in schemas on any level of nested structures. BTW it works 
for json in duckdb. It is my question in duckdb discussion: 
   https://github.com/duckdb/duckdb/discussions/11633
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

Reply via email to