kevinjqliu commented on code in PR #2866:
URL: https://github.com/apache/iceberg-python/pull/2866#discussion_r2649878625
##########
pyiceberg/table/update/__init__.py:
##########
@@ -179,13 +179,9 @@ class SetStatisticsUpdate(IcebergBaseModel):
description="snapshot-id is **DEPRECATED for REMOVAL** since it
contains redundant information. Use `statistics.snapshot-id` field instead.",
)
- @model_validator(mode="before")
- def validate_snapshot_id(cls, data: dict[str, Any]) -> dict[str, Any]:
- stats = cast(StatisticsFile, data["statistics"])
-
- data["snapshot_id"] = stats.snapshot_id
-
- return data
+ @model_validator(mode="after")
+ def validate_snapshot_id(self) -> Self:
+ return self.model_copy(update={"snapshot_id":
self.statistics.snapshot_id})
Review Comment:
```suggestion
@model_validator(mode="after")
def validate_snapshot_id(self) -> Self:
self.snapshot_id = self.statistics.snapshot_id
return self
```
nit: use direct assignment
##########
pyiceberg/table/update/__init__.py:
##########
Review Comment:
add a test, something like:
```
def test_set_statistics_update_without_snapshot_id(table_v2_with_statistics:
Table) -> None:
"""Test that SetStatisticsUpdate auto-populates snapshot_id from
statistics."""
snapshot_id = table_v2_with_statistics.metadata.current_snapshot_id
blob_metadata = BlobMetadata(
type="apache-datasketches-theta-v1",
snapshot_id=snapshot_id,
sequence_number=2,
fields=[1],
properties={"prop-key": "prop-value"},
)
statistics_file = StatisticsFile(
snapshot_id=snapshot_id,
statistics_path="s3://bucket/warehouse/stats.puffin",
file_size_in_bytes=124,
file_footer_size_in_bytes=27,
blob_metadata=[blob_metadata],
)
# Create update without explicitly providing snapshot_id
update = SetStatisticsUpdate(statistics=statistics_file)
# Validator should auto-populate snapshot_id from statistics
assert update.snapshot_id == snapshot_id
new_metadata = update_table_metadata(
table_v2_with_statistics.metadata,
(update,),
)
assert len(new_metadata.statistics) == 2
updated_statistics = [stat for stat in new_metadata.statistics if
stat.snapshot_id == snapshot_id]
assert len(updated_statistics) == 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]