Fokko commented on code in PR #411: URL: https://github.com/apache/iceberg-python/pull/411#discussion_r1485413474
########## pyiceberg/table/metadata.py: ########## @@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]: The TableMetadata with the defaults applied. """ # When the schema doesn't have an ID - if data.get("schema") and "schema_id" not in data["schema"]: - data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID + schema = data.get("schema") + if schema and isinstance(schema, dict): Review Comment: This should do it: ```suggestion if isinstance(schema, dict): ``` See: ``` python3 Python 3.11.7 (main, Dec 4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> isinstance(None, dict) False ``` ########## pyiceberg/table/metadata.py: ########## @@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]: The TableMetadata with the defaults applied. """ # When the schema doesn't have an ID - if data.get("schema") and "schema_id" not in data["schema"]: - data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID + schema = data.get("schema") + if schema and isinstance(schema, dict): + if "schema_id" not in data["schema"]: + data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID Review Comment: Hey @amogh-jahagirdar Yes, this is a bit awkward in Pydantic. This is a `@model_validator(mode="before")` which means that it will always apply the validator before trying to construct the model. This will allow you to correct certain things before applying the checks. This particular function will make sure that the schema-id is set. Where `schema_id` is [required](https://github.com/apache/iceberg-python/blob/main/pyiceberg/schema.py#L81). Keep in mind that this also does have applied the aliases, so both `schema_id` and `schema-id` are valid. The before-mode model validator also be applied when you initialize something off: `TableMetadataV1(bla=(1,2,3))` The error that you're seeing is after the `mode="before"` validators are applied. I would put a breakpoint in the validator, and trace up the stack where the bogus data comes from. ########## pyiceberg/table/metadata.py: ########## @@ -260,8 +260,10 @@ def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]: The TableMetadata with the defaults applied. """ # When the schema doesn't have an ID - if data.get("schema") and "schema_id" not in data["schema"]: - data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID + schema = data.get("schema") + if schema and isinstance(schema, dict): + if "schema_id" not in data["schema"]: Review Comment: ```suggestion if "schema_id" not in schema: ``` ########## tests/catalog/test_hive.py: ########## @@ -294,6 +294,37 @@ def test_create_table(table_schema_simple: Schema, hive_database: HiveDatabase, assert metadata.model_dump() == expected.model_dump() + catalog.create_table( + ("default", "table_v1"), schema=table_schema_simple, properties={"owner": "javaberg", "format-version": "1"} + ) + expected_v1 = TableMetadataV1( + location=metadata.location, + table_uuid=metadata.table_uuid, + last_updated_ms=metadata.last_updated_ms, + last_column_id=3, + schema=Schema( Review Comment: I know that it is still required, but also deprecated: <img width="707" alt="image" src="https://github.com/apache/iceberg-python/assets/1134248/3b96e21c-9d41-40cb-be87-e583d166341d"> Setting `schemas` makes more sense to me. ########## pyiceberg/table/metadata.py: ########## @@ -411,6 +414,21 @@ def new_table_metadata( if table_uuid is None: table_uuid = uuid.uuid4() + format_version = properties.get("format-version", DEFAULT_FORMAT_VERSION) + if format_version == "1": Review Comment: It looks like were mixing types here. ```suggestion format_version = int(properties.get("format-version", DEFAULT_FORMAT_VERSION)) if format_version == 1: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org