amogh-jahagirdar commented on issue #6647: URL: https://github.com/apache/iceberg/issues/6647#issuecomment-1401099271
Need to investigate more deeply but based on the stack trace it seems like the metadata field on the parquet_schema is not even defined. Here's where pyarrow returns the schema https://github.com/apache/arrow/blob/master/python/pyarrow/parquet/core.py#L3656 Not entirely sure what this metadata field is in parquet_schema and if every parquet writer is expected to write this out. I'll double check the Parquet spec, but If I had to hazard a guess, I would say it's probably not required to write out and we need to perform the same check done here https://github.com/apache/iceberg/blob/master/python/pyiceberg/io/pyarrow.py#L509 and raise an error linking to this known issue https://github.com/apache/iceberg/issues/6505 that PyIceberg should derive the Iceberg schema from the actual Parquet schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org