HonahX commented on code in PR #728:
URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1611069223
##########
tests/table/test_init.py:
##########
@@ -652,6 +652,58 @@ def test_update_metadata_add_snapshot(table_v2: Table) ->
None:
assert new_metadata.last_updated_ms == new_snapshot.timestamp_ms
+def test_update_metadata_create_tag(table_v2: Table) -> None:
+ update =
table_v2.manage_snapshots().create_tag(snapshot_id=3051729675574597004,
tag_name="test123")
+
+ new_metadata = update.table_metadata
Review Comment:
To better simulate the real use case, shall we first commit the transaction
and use the new table's metadata in the assertion?
##########
pyiceberg/table/__init__.py:
##########
@@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False):
self._autocommit = autocommit
self._updates = ()
self._requirements = ()
+ self._manage_snapshots = Transaction.ManageSnapshot(self)
+
+ class ManageSnapshot:
Review Comment:
With the current implementation, if users call the
`table.managed_snapshots()`, and get a `ManageSnapshots` object, they either
have to manually commit the transaction after they create some tags/branches or
do something like
```python
with table.manage_snapshots().create_tag(...) as txn:
txn.manage_snapshots().create_branch(...)
```
I am thinking that `ManageSnapshots` can be an outer class like
[UpdaetSchema](https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1920)
and implements the UpdateTableMetadata:
https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1898-L1916
There are several benefits:
1. We could do multiple tags and branch creations and then _apply once to
reduce the number of `transaction._apply()` call since that will trigger
requirement validations every time. This also allow us to avoid duplicated
requirements being added such as `AssertTableUUID`
2. Users can utilize context manger if they use `table.manage_snapshots()`
```python
with table.manage_snapshots() as ms:
ms.createTag("Tag_A", 0)
ms.createTag("Tag_B", 1)
```
We could utilize Transaction's autocommit option when implementing
`table.manage_snapshots`, like:
https://github.com/apache/iceberg-python/blob/55bbc1863aba6646b2dc36573355aabb11909095/pyiceberg/table/__init__.py#L1407-L1422
We could also make create_tag returns `ManageSnapshots` itself to support
chain operations:
```python
with table.manage_snapshots() as ms:
ms.createTag("Tag_A", 0).createTag("Tag_B", 1)
ms.createBranch("Branch_A", 0).createBranch("branch_B", 1)
```
What do you think?
##########
pyiceberg/table/__init__.py:
##########
@@ -340,6 +386,66 @@ def set_properties(self, properties: Properties =
EMPTY_DICT, **kwargs: Any) ->
updates = properties or kwargs
return self._apply((SetPropertiesUpdate(updates=updates),))
+ @deprecated(
+ deprecated_in="0.7.0",
+ removed_in="0.7.0",
Review Comment:
I think we should deprecate these in 0.7.0 and remove this in 0.8.0 so users
at least have one released version to react on this deprecation.
##########
pyiceberg/table/__init__.py:
##########
@@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False):
self._autocommit = autocommit
self._updates = ()
self._requirements = ()
+ self._manage_snapshots = Transaction.ManageSnapshot(self)
+
+ class ManageSnapshot:
Review Comment:
```suggestion
class ManageSnapshots:
```
How about using the plural form for the name?
##########
pyiceberg/table/__init__.py:
##########
@@ -340,6 +386,66 @@ def set_properties(self, properties: Properties =
EMPTY_DICT, **kwargs: Any) ->
updates = properties or kwargs
return self._apply((SetPropertiesUpdate(updates=updates),))
+ @deprecated(
+ deprecated_in="0.7.0",
+ removed_in="0.7.0",
+ help_message="Please use one of the functions in
transaction.manage_snapshots instead",
+ )
+ def add_snapshot(self, snapshot: Snapshot) -> None:
+ pass
Review Comment:
I think we should keep deprecated APIs functional until we remove it
completely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]