Fokko commented on code in PR #411:
URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485647073


##########
pyiceberg/table/metadata.py:
##########
@@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) 
-> Dict[str, Any]:
             The TableMetadata with the defaults applied.
         """
         # When the schema doesn't have an ID
-        if data.get("schema") and "schema_id" not in data["schema"]:
-            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+        schema = data.get("schema")
+        if isinstance(schema, dict):
+            if "schema_id" not in schema:

Review Comment:
   We want to check for the alias here as well.
   ```suggestion
               if "schema_id" not in schema and "schema-id" not in schema:
   ```



##########
pyiceberg/table/metadata.py:
##########
@@ -411,6 +442,24 @@ def new_table_metadata(
     if table_uuid is None:
         table_uuid = uuid.uuid4()
 
+    # Remove format-version so it does not get persisted
+    format_version = int(properties.pop("format-version", 
DEFAULT_FORMAT_VERSION))

Review Comment:
   What do you think of moving the literal and constant to `TableProperties`?



##########
pyiceberg/table/metadata.py:
##########
@@ -313,6 +315,34 @@ def construct_partition_specs(cls, data: Dict[str, Any]) 
-> Dict[str, Any]:
 
         return data
 
+    @model_validator(mode="before")
+    def construct_v1_spec_from_v2_fields(cls, data: Dict[str, Any]) -> 
Dict[str, Any]:

Review Comment:
   I'd rather avoid the before validators to patch in the partition-spec and 
schema (probably sort-order as well?). I'd much rather see that we explicitly 
pass in the spec below:
   
   ```python
       if format_version == 1:
           return TableMetadataV1(
               location=location,
               schema=fresh_schema,
               last_column_id=fresh_schema.highest_field_id,
               current_schema_id=fresh_schema.schema_id,
   +           partition_spec=fresh_partition_spec,
               partition_specs=[fresh_partition_spec],
               default_spec_id=fresh_partition_spec.spec_id,
   +           sort_order=fresh_sort_order,
               sort_orders=[fresh_sort_order],
               default_sort_order_id=fresh_sort_order.order_id,
               properties=properties,
               last_partition_id=fresh_partition_spec.last_assigned_field_id,
               table_uuid=table_uuid,
           )
   ```
   
   My biggest concern is again that this is a before validator, and we don't 
take into account the aliases (`partition-specs`, `partition-spec-id`, etc).
   
   It looks like Java just ignores the `schema` if there is a `schemas`: 
https://github.com/apache/iceberg/blob/5f577f1b9902ffe6181897a439686a31fc81b89a/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L343-L368



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to