HonahX commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1611069223
########## tests/table/test_init.py: ########## @@ -652,6 +652,58 @@ def test_update_metadata_add_snapshot(table_v2: Table) -> None: assert new_metadata.last_updated_ms == new_snapshot.timestamp_ms +def test_update_metadata_create_tag(table_v2: Table) -> None: + update = table_v2.manage_snapshots().create_tag(snapshot_id=3051729675574597004, tag_name="test123") + + new_metadata = update.table_metadata Review Comment: To better simulate the real use case, shall we first commit the transaction and use the new table's metadata in the assertion? ########## pyiceberg/table/__init__.py: ########## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit self._updates = () self._requirements = () + self._manage_snapshots = Transaction.ManageSnapshot(self) + + class ManageSnapshot: Review Comment: With the current implementation, if users call the `table.managed_snapshots()`, and get a `ManageSnapshots` object, they either have to manually commit the transaction after they create some tags/branches or do something like ```python with table.manage_snapshots().create_tag(...) as txn: txn.manage_snapshots().create_branch(...) ``` I am thinking that `ManageSnapshots` can be an outer class like [UpdaetSchema](https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1920) and implements the UpdateTableMetadata: https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1898-L1916 There are several benefits: 1. We could do multiple tags and branch creations and then _apply once to reduce the number of `transaction._apply()` call since that will trigger requirement validations every time. This also allow us to avoid duplicated requirements being added such as `AssertTableUUID` 2. Users can utilize context manger if they use `table.manage_snapshots()` ```python with table.manage_snapshots() as ms: ms.createTag("Tag_A", 0) ms.createTag("Tag_B", 1) ``` We could utilize Transaction's autocommit option when implementing `table.manage_snapshots`, like: https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1407-L1422 We could also make create_tag returns `ManageSnapshots` itself to support chain operations: ```python with table.manage_snapshots() as ms: ms.createTag("Tag_A", 0).createTag("Tag_B", 1) ms.createBranch("Branch_A", 0).createBranch("branch_B", 1) ``` What do you think? ########## pyiceberg/table/__init__.py: ########## @@ -340,6 +386,66 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properties or kwargs return self._apply((SetPropertiesUpdate(updates=updates),)) + @deprecated( + deprecated_in="0.7.0", + removed_in="0.7.0", Review Comment: I think we should deprecate these in 0.7.0 and remove this in 0.8.0 so users at least have one released version to react on this deprecation. ########## pyiceberg/table/__init__.py: ########## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit self._updates = () self._requirements = () + self._manage_snapshots = Transaction.ManageSnapshot(self) + + class ManageSnapshot: Review Comment: ```suggestion class ManageSnapshots: ``` How about using the plural form for the name? ########## pyiceberg/table/__init__.py: ########## @@ -340,6 +386,66 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properties or kwargs return self._apply((SetPropertiesUpdate(updates=updates),)) + @deprecated( + deprecated_in="0.7.0", + removed_in="0.7.0", + help_message="Please use one of the functions in transaction.manage_snapshots instead", + ) + def add_snapshot(self, snapshot: Snapshot) -> None: + pass Review Comment: I think we should keep deprecated APIs functional until we remove it completely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org