Fokko commented on code in PR #411:
URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485413474


##########
pyiceberg/table/metadata.py:
##########
@@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) 
-> Dict[str, Any]:
             The TableMetadata with the defaults applied.
         """
         # When the schema doesn't have an ID
-        if data.get("schema") and "schema_id" not in data["schema"]:
-            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+        schema = data.get("schema")
+        if schema and isinstance(schema, dict):

Review Comment:
   This should do it:
   ```suggestion
           if isinstance(schema, dict):
   ```
   
   See:
   ```
   python3
   Python 3.11.7 (main, Dec  4 2023, 18:10:11) [Clang 15.0.0 
(clang-1500.1.0.2.5)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> isinstance(None, dict)
   False
   ```



##########
pyiceberg/table/metadata.py:
##########
@@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) 
-> Dict[str, Any]:
             The TableMetadata with the defaults applied.
         """
         # When the schema doesn't have an ID
-        if data.get("schema") and "schema_id" not in data["schema"]:
-            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+        schema = data.get("schema")
+        if schema and isinstance(schema, dict):
+            if "schema_id" not in data["schema"]:
+                data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID

Review Comment:
   Hey @amogh-jahagirdar Yes, this is a bit awkward in Pydantic. This is a 
`@model_validator(mode="before")` which means that it will always apply the 
validator before trying to construct the model. This will allow you to correct 
certain things before applying the checks. This particular function will make 
sure that the schema-id is set. Where `schema_id` is 
[required](https://github.com/apache/iceberg-python/blob/main/pyiceberg/schema.py#L81).
 Keep in mind that this also does have applied the aliases, so both `schema_id` 
and `schema-id` are valid.
   
   The before-mode model validator also be applied when you initialize 
something off: `TableMetadataV1(bla=(1,2,3))`
   
   The error that you're seeing is after the `mode="before"` validators are 
applied. I would put a breakpoint in the validator, and trace up the stack 
where the bogus data comes from.
   



##########
pyiceberg/table/metadata.py:
##########
@@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) 
-> Dict[str, Any]:
             The TableMetadata with the defaults applied.
         """
         # When the schema doesn't have an ID
-        if data.get("schema") and "schema_id" not in data["schema"]:
-            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+        schema = data.get("schema")
+        if schema and isinstance(schema, dict):
+            if "schema_id" not in data["schema"]:

Review Comment:
   ```suggestion
               if "schema_id" not in schema:
   ```



##########
tests/catalog/test_hive.py:
##########
@@ -294,6 +294,37 @@ def test_create_table(table_schema_simple: Schema, 
hive_database: HiveDatabase,
 
     assert metadata.model_dump() == expected.model_dump()
 
+    catalog.create_table(
+        ("default", "table_v1"), schema=table_schema_simple, 
properties={"owner": "javaberg", "format-version": "1"}
+    )
+    expected_v1 = TableMetadataV1(
+        location=metadata.location,
+        table_uuid=metadata.table_uuid,
+        last_updated_ms=metadata.last_updated_ms,
+        last_column_id=3,
+        schema=Schema(

Review Comment:
   I know that it is still required, but also deprecated:
   <img width="707" alt="image" 
src="https://github.com/apache/iceberg-python/assets/1134248/3b96e21c-9d41-40cb-be87-e583d166341d";>
   
   Setting `schemas` makes more sense to me.



##########
pyiceberg/table/metadata.py:
##########
@@ -411,6 +414,21 @@ def new_table_metadata(
     if table_uuid is None:
         table_uuid = uuid.uuid4()
 
+    format_version = properties.get("format-version", DEFAULT_FORMAT_VERSION)
+    if format_version == "1":

Review Comment:
   It looks like were mixing types here.
   ```suggestion
       format_version = int(properties.get("format-version", 
DEFAULT_FORMAT_VERSION))
       if format_version == 1:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to