[I] Use of pyarrow_to_schema [iceberg-python]

via GitHub Fri, 12 Dec 2025 05:59:02 -0800


tjulioh opened a new issue, #2825:
URL: https://github.com/apache/iceberg-python/issues/2825


   ### Question
   
   Hello,
   
   I have Iceberg tables and I’m loading them using load_table through the Glue 
catalog.
   
   I want to compare them with Parquets using DuckDB `pqt_data = 
duckdb.sql(f"SELECT * FROM read_parquet({pqt}, union_by_name=true)").arrow()`, 
but after that I want to check if there are any different columns to perform 
schema evolution. At this point, I tried using `pqt_schema = 
pyarrow_to_schema(pqt_data.schema)` but the result is an error because it 
doesn’t have an id or name mapping `Parquet file does not have field-ids and 
the Iceberg table does not have 'schema.name-mapping.default' defined`. What is 
the reason and is there any solution for this? I just want to compare the types 
and names in a simple way.
   
   Some code example:
   
   ```
   pqt_data = duckdb.sql(f"SELECT * FROM read_parquet({pqt}, 
union_by_name=true)").arrow()
   pqt_schema = pyarrow_to_schema(pqt_data.schema)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Use of pyarrow_to_schema [iceberg-python]

Reply via email to