chinmay-bhat commented on code in PR #758:
URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1667693132
##########
pyiceberg/table/__init__.py:
##########
@@ -1956,6 +1957,10 @@ def _commit(self) -> UpdatesAndRequirements:
"""Apply the pending changes and commit."""
return self._updates, self._requirements
+ def _commit_if_ref_updates_exist(self) -> None:
+ self.commit()
+ self._updates, self._requirements = (), ()
Review Comment:
Hi, I'm re-opening this resolved conversation, since I don't think adding
the additional parameter is enough.
Say, in the future, we have more APIs like:
```python
branch_name, min_snapshots_to_keep = "test_branch_min_snapshots_to_keep", 2
with tbl.manage_snapshots() as ms:
ms.create_branch(branch_name=branch_name, snapshot_id=snapshot_id)
ms.set_min_snapshots_to_keep(branch_name=branch_name,
min_snapshots_to_keep=min_snapshots_to_keep)
```
The updates and requirements would be :
<code> (SetSnapshotRefUpdate(action='set-snapshot-ref',
ref_name='test_branch_min_snapshots_to_keep', type='branch',
snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None,
min_snapshots_to_keep=None), SetSnapshotRefUpdate(action='set-snapshot-ref',
ref_name='test_branch_min_snapshots_to_keep', type='branch',
snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None,
min_snapshots_to_keep=2))
(AssertRefSnapshotId(type='assert-ref-snapshot-id',
ref='test_branch_min_snapshots_to_keep', snapshot_id=None),
AssertRefSnapshotId(type='assert-ref-snapshot-id',
ref='test_branch_min_snapshots_to_keep', snapshot_id=71191752302974125)) </code>
The 2nd requirement will end up as a `CommitFailedException`as the branch
would be missing.
With `transaction._apply`, the `table_metadata` would get updated with every
`_commit_if_ref_updates_exist()`, but when the transaction exits, it will try
to `commit_transaction` which runs `_do_commit()` which runs `_commit_table()`.
In `_commit_table()`, for non-REST catalogs, we `_update_and_stage_table()`
where we check the requirements with current table metadata, here the 2nd
requirement fails.
To fix this, we might consider one of the following solutions:
1. in `transaction._apply` combine all the ref updates, identify the
differences between current table metadata and staged metadata, and only pass
those differences in `self._updates`, while not sending the ref updates
requirements (since we've already validated them once in `transaction._apply`)
OR
2. improve `_update_and_stage_table()` to iteratively apply the update with
corresponding requirement and always check the requirements with
updated_metadata. This is easier than (1), but only serves non-REST catalogs. OR
3. continue the original implementation, i.e. for every
`commit_if_ref_exists()`, the Transaction commits to the table. This would be
expensive IMO, but the result would remain atomic and correct, with minimal
changes in the PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]