amogh-jahagirdar commented on issue #6647:
URL: https://github.com/apache/iceberg/issues/6647#issuecomment-1401099271

   Need to investigate more deeply but based on the stack trace it seems like 
the metadata field on the parquet_schema is not even defined. Here's where 
pyarrow returns the schema 
https://github.com/apache/arrow/blob/master/python/pyarrow/parquet/core.py#L3656
   
   Not entirely sure what this metadata field is in parquet_schema and if every 
parquet writer is expected to write this out.
   
   I'll double check the Parquet spec, but If I had to hazard a guess, I would 
say it's probably not required to write out and we need to perform the same 
check done here 
https://github.com/apache/iceberg/blob/master/python/pyiceberg/io/pyarrow.py#L509
   and raise an error linking to this known issue 
https://github.com/apache/iceberg/issues/6505
   that PyIceberg should derive the Iceberg schema from the actual Parquet 
schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to