HonahX commented on code in PR #411:
URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485730495


##########
pyiceberg/table/metadata.py:
##########
@@ -404,13 +406,34 @@ def new_table_metadata(
     properties: Properties = EMPTY_DICT,
     table_uuid: Optional[uuid.UUID] = None,
 ) -> TableMetadata:
+    from pyiceberg.table import TableProperties
+
     fresh_schema = assign_fresh_schema_ids(schema)
     fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, 
schema, fresh_schema)
     fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, 
fresh_schema)
 
     if table_uuid is None:
         table_uuid = uuid.uuid4()
 
+    # Remove format-version so it does not get persisted
+    format_version = int(properties.pop(TableProperties.FORMAT_VERSION, 
TableProperties.DEFAULT_FORMAT_VERSION))
+    if format_version == 1:
+        return TableMetadataV1(
+            location=location,
+            schema_=fresh_schema,

Review Comment:
   Is this needed? We set `schema=fresh_schema` below. It seems the resulting 
metadata is correct if we remove this line



##########
pyiceberg/table/metadata.py:
##########
@@ -404,13 +406,34 @@ def new_table_metadata(
     properties: Properties = EMPTY_DICT,
     table_uuid: Optional[uuid.UUID] = None,
 ) -> TableMetadata:
+    from pyiceberg.table import TableProperties
+
     fresh_schema = assign_fresh_schema_ids(schema)
     fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec, 
schema, fresh_schema)
     fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema, 
fresh_schema)
 
     if table_uuid is None:
         table_uuid = uuid.uuid4()
 
+    # Remove format-version so it does not get persisted
+    format_version = int(properties.pop(TableProperties.FORMAT_VERSION, 
TableProperties.DEFAULT_FORMAT_VERSION))
+    if format_version == 1:
+        return TableMetadataV1(
+            location=location,
+            schema_=fresh_schema,
+            last_column_id=fresh_schema.highest_field_id,
+            current_schema_id=fresh_schema.schema_id,
+            schema=fresh_schema,
+            partition_spec=[fresh_partition_spec.model_dump()],

Review Comment:
   ```suggestion
               partition_spec=[field.model_dump() for field in 
fresh_partition_spec.fields],
   ```
   I think the `partition_spec` is a list of partition fields instead of a 
single PartitionSpec according to [metadata JSON serialization 
spec](https://iceberg.apache.org/spec/#table-metadata-and-snapshots)
   
   Currently, this field stores 
   ```
   'partition-spec': [{'fields': [], 'spec-id': 0}]
   ```
   for unpartitioned v1 table, but it should be just an empty list for 
unpartitioned table.
   
   I also checked the unpartitioned v1 table written by spark and the 
`partition_spec` is `[]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to