HonahX commented on code in PR #411: URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485730495
########## pyiceberg/table/metadata.py: ########## @@ -404,13 +406,34 @@ def new_table_metadata( properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, ) -> TableMetadata: + from pyiceberg.table import TableProperties + fresh_schema = assign_fresh_schema_ids(schema) fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) if table_uuid is None: table_uuid = uuid.uuid4() + # Remove format-version so it does not get persisted + format_version = int(properties.pop(TableProperties.FORMAT_VERSION, TableProperties.DEFAULT_FORMAT_VERSION)) + if format_version == 1: + return TableMetadataV1( + location=location, + schema_=fresh_schema, Review Comment: Is this needed? We set `schema=fresh_schema` below. It seems the resulting metadata is correct if we remove this line ########## pyiceberg/table/metadata.py: ########## @@ -404,13 +406,34 @@ def new_table_metadata( properties: Properties = EMPTY_DICT, table_uuid: Optional[uuid.UUID] = None, ) -> TableMetadata: + from pyiceberg.table import TableProperties + fresh_schema = assign_fresh_schema_ids(schema) fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, schema, fresh_schema) fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, fresh_schema) if table_uuid is None: table_uuid = uuid.uuid4() + # Remove format-version so it does not get persisted + format_version = int(properties.pop(TableProperties.FORMAT_VERSION, TableProperties.DEFAULT_FORMAT_VERSION)) + if format_version == 1: + return TableMetadataV1( + location=location, + schema_=fresh_schema, + last_column_id=fresh_schema.highest_field_id, + current_schema_id=fresh_schema.schema_id, + schema=fresh_schema, + partition_spec=[fresh_partition_spec.model_dump()], Review Comment: ```suggestion partition_spec=[field.model_dump() for field in fresh_partition_spec.fields], ``` I think the `partition_spec` is a list of partition fields instead of a single PartitionSpec according to [metadata JSON serialization spec](https://iceberg.apache.org/spec/#table-metadata-and-snapshots) Currently, this field stores ``` 'partition-spec': [{'fields': [], 'spec-id': 0}] ``` for unpartitioned v1 table, but it should be just an empty list for unpartitioned table. I also checked the unpartitioned v1 table written by spark and the `partition_spec` is `[]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org