chinmay-bhat commented on code in PR #758: URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1667693132
########## pyiceberg/table/__init__.py: ########## @@ -1956,6 +1957,10 @@ def _commit(self) -> UpdatesAndRequirements: """Apply the pending changes and commit.""" return self._updates, self._requirements + def _commit_if_ref_updates_exist(self) -> None: + self.commit() + self._updates, self._requirements = (), () Review Comment: Hi, I'm re-opening this resolved conversation, since I don't think adding the additional parameter is enough. Say, in the future, we have more APIs like: ```python branch_name, min_snapshots_to_keep = "test_branch_min_snapshots_to_keep", 2 with tbl.manage_snapshots() as ms: ms.create_branch(branch_name=branch_name, snapshot_id=snapshot_id) ms.set_min_snapshots_to_keep(branch_name=branch_name, min_snapshots_to_keep=min_snapshots_to_keep) ``` The updates and requirements would be : <code> (SetSnapshotRefUpdate(action='set-snapshot-ref', ref_name='test_branch_min_snapshots_to_keep', type='branch', snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None, min_snapshots_to_keep=None), SetSnapshotRefUpdate(action='set-snapshot-ref', ref_name='test_branch_min_snapshots_to_keep', type='branch', snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None, min_snapshots_to_keep=2)) (AssertRefSnapshotId(type='assert-ref-snapshot-id', ref='test_branch_min_snapshots_to_keep', snapshot_id=None), AssertRefSnapshotId(type='assert-ref-snapshot-id', ref='test_branch_min_snapshots_to_keep', snapshot_id=71191752302974125)) </code> The 2nd requirement will end up as a `CommitFailedException`as the branch would be missing. With `transaction._apply`, the `table_metadata` would get updated with every `_commit_if_ref_updates_exist()`, but when the transaction exits, it will try to `commit_transaction` which runs `_do_commit()` which runs `_commit_table()`. In `_commit_table()`, for non-REST catalogs, we `_update_and_stage_table()` where we check the requirements with current table metadata, here the 2nd requirement fails. To fix this, we might consider one of the following solutions: 1. in `transaction._apply` combine all the ref updates, identify the differences between current table metadata and staged metadata, and only pass those differences in `self._updates`, while not sending the ref updates requirements (since we've already validated them once in `transaction._apply`) OR 2. improve `_update_and_stage_table()` to iteratively apply the update with corresponding requirement and always check the requirements with updated_metadata. This is easier than (1), but only serves non-REST catalogs. OR 3. continue the original implementation, i.e. for every `commit_if_ref_exists()`, the Transaction commits to the table. This would be expensive IMO, but the result would remain atomic and correct, with minimal changes in the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org