HonahX commented on code in PR #183: URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1419923194
########## pyiceberg/io/pyarrow.py: ########## @@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> Optional[T]: """Visit a primitive type.""" -def _get_field_id(field: pa.Field) -> Optional[int]: - for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS: - if field_id_str := field.metadata.get(pyarrow_field_id_key): - return int(field_id_str.decode()) - return None +class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]): + counter: count[int] + missing_is_metadata: Optional[bool] + def __init__(self) -> None: + self.counter = count() Review Comment: I happen to have an iceberg table (migrated from delta lake) whose parquet files contain no field-id. With this change, I am now able to use pyiceberg to read its data. This is really great! Out of curiosity, are there any additional use-cases where this PR might be beneficial? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org