syun64 commented on PR #473: URL: https://github.com/apache/iceberg-python/pull/473#issuecomment-1965729096
> Thanks for the great catch @syun64 ! My understanding is that we need to write `current_snapshot_id` to `-1` when serializing the new metadata object to JSON. Would it be better to directly update the serializer to support this backwards compatibility? [ToOutputFile.table_metadata](https://github.com/apache/iceberg-python/blob/main/pyiceberg/serializers.py#L118-L132) > > Internally, either `-1` or `None` can represent "no current snapshot id". But I think `None` is better as it aligns more with the [spec](https://iceberg.apache.org/spec/#table-metadata-fields) which states that `current_snapshot_id` is optional and make things more intuitive. WDYT? Great suggestion @HonahX . I tried adding a [@field_serializer](https://docs.pydantic.dev/latest/api/functional_serializers/#pydantic.functional_serializers.field_serializer) to the TableMetadata1 class to test it out, but unfortunately the serialized output from model_dump_json doesn't seem to serialize `None` value as `-1` as we'd want. I'm still trying to figure out what about our pydantic class definition isn't allowing us to add the custom serializer, for example: ``` class TableMetadataV1(TableMetadataCommonFields, IcebergBaseModel): ... @field_serializer('current_snapshot_id') def serialize_current_snapshot_id(self, current_snapshot_id: Optional[int]): return current_snapshot_id if current_snapshot_id is not None else -1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org