kevinjqliu commented on code in PR #829: URL: https://github.com/apache/iceberg-python/pull/829#discussion_r1676323868
########## pyiceberg/table/__init__.py: ########## @@ -484,10 +484,6 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) _check_schema_compatible( self._table.schema(), other_schema=df.schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us ) - # cast if the two schemas are compatible but not equal Review Comment: @syun64 I want to get your take on this part. Due to the timestamp change, do you know if the `df` need to be casted? There are a couple of different parts involved in the write path. In particular, we need to look at the table schema, the df schema, and the df itself. As well as dealing with bin-packing and other transformations. ########## pyiceberg/table/__init__.py: ########## @@ -484,10 +484,6 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) _check_schema_compatible( self._table.schema(), other_schema=df.schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us ) - # cast if the two schemas are compatible but not equal Review Comment: Happy to extract this convo into an issue, to also continue the convo from https://github.com/apache/iceberg-python/pull/786#discussion_r1646417180 ########## pyiceberg/io/pyarrow.py: ########## @@ -2053,7 +2055,10 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." ) from e - if table_schema.as_struct() != task_schema.as_struct(): + fields_missing_from_table = {field for field in other_schema.fields if field not in table_schema.fields} Review Comment: this doesn't work for nested structs, need a better solution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org