Fokko commented on code in PR #411: URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485647073
########## pyiceberg/table/metadata.py: ########## @@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]: The TableMetadata with the defaults applied. """ # When the schema doesn't have an ID - if data.get("schema") and "schema_id" not in data["schema"]: - data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID + schema = data.get("schema") + if isinstance(schema, dict): + if "schema_id" not in schema: Review Comment: We want to check for the alias here as well. ```suggestion if "schema_id" not in schema and "schema-id" not in schema: ``` ########## pyiceberg/table/metadata.py: ########## @@ -411,6 +442,24 @@ def new_table_metadata( if table_uuid is None: table_uuid = uuid.uuid4() + # Remove format-version so it does not get persisted + format_version = int(properties.pop("format-version", DEFAULT_FORMAT_VERSION)) Review Comment: What do you think of moving the literal and constant to `TableProperties`? ########## pyiceberg/table/metadata.py: ########## @@ -313,6 +315,34 @@ def construct_partition_specs(cls, data: Dict[str, Any]) -> Dict[str, Any]: return data + @model_validator(mode="before") + def construct_v1_spec_from_v2_fields(cls, data: Dict[str, Any]) -> Dict[str, Any]: Review Comment: I'd rather avoid the before validators to patch in the partition-spec and schema (probably sort-order as well?). I'd much rather see that we explicitly pass in the spec below: ```python if format_version == 1: return TableMetadataV1( location=location, schema=fresh_schema, last_column_id=fresh_schema.highest_field_id, current_schema_id=fresh_schema.schema_id, + partition_spec=fresh_partition_spec, partition_specs=[fresh_partition_spec], default_spec_id=fresh_partition_spec.spec_id, + sort_order=fresh_sort_order, sort_orders=[fresh_sort_order], default_sort_order_id=fresh_sort_order.order_id, properties=properties, last_partition_id=fresh_partition_spec.last_assigned_field_id, table_uuid=table_uuid, ) ``` My biggest concern is again that this is a before validator, and we don't take into account the aliases (`partition-specs`, `partition-spec-id`, etc). It looks like Java just ignores the `schema` if there is a `schemas`: https://github.com/apache/iceberg/blob/5f577f1b9902ffe6181897a439686a31fc81b89a/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L343-L368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org