chinmay-bhat commented on code in PR #758:
URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1667693132


##########
pyiceberg/table/__init__.py:
##########
@@ -1956,6 +1957,10 @@ def _commit(self) -> UpdatesAndRequirements:
         """Apply the pending changes and commit."""
         return self._updates, self._requirements
 
+    def _commit_if_ref_updates_exist(self) -> None:
+        self.commit()
+        self._updates, self._requirements = (), ()

Review Comment:
   Hi, I'm re-opening this resolved conversation, since I don't think adding 
the additional parameter is enough.
   
   Say, in the future, we have more APIs like:
   ```python
   branch_name, min_snapshots_to_keep = "test_branch_min_snapshots_to_keep", 2
   with tbl.manage_snapshots() as ms:
           ms.create_branch(branch_name=branch_name, snapshot_id=snapshot_id)
           ms.set_min_snapshots_to_keep(branch_name=branch_name, 
min_snapshots_to_keep=min_snapshots_to_keep)
   ```
   
   The updates and requirements would be :
   <code> (SetSnapshotRefUpdate(action='set-snapshot-ref', 
ref_name='test_branch_min_snapshots_to_keep', type='branch', 
snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None, 
min_snapshots_to_keep=None), SetSnapshotRefUpdate(action='set-snapshot-ref', 
ref_name='test_branch_min_snapshots_to_keep', type='branch', 
snapshot_id=71191752302974125, max_ref_age_ms=None, max_snapshot_age_ms=None, 
min_snapshots_to_keep=2))
   
   (AssertRefSnapshotId(type='assert-ref-snapshot-id', 
ref='test_branch_min_snapshots_to_keep', snapshot_id=None), 
AssertRefSnapshotId(type='assert-ref-snapshot-id', 
ref='test_branch_min_snapshots_to_keep', snapshot_id=71191752302974125)) </code>
   
   The 2nd requirement will end up as a `CommitFailedException`as the branch 
would be missing. 
   With `transaction._apply`, the `table_metadata` would get updated with every 
`_commit_if_ref_updates_exist()`, but when the transaction exits, it will try 
to `commit_transaction` which runs `_do_commit()` which runs `_commit_table()`. 
   
   In `_commit_table()`, for non-REST catalogs, we `_update_and_stage_table()` 
where we check the requirements with current table metadata, here the 2nd 
requirement fails.
   
   To fix this, we might consider one of the following solutions:
   
   1. in `transaction._apply` combine all the ref updates, identify the 
differences between current table metadata and staged metadata, and only pass 
those differences in `self._updates`, while not sending the ref updates 
requirements (since we've already validated them once in `transaction._apply`) 
OR
   2. improve `_update_and_stage_table()` to iteratively apply the update with 
corresponding requirement and always check the requirements with 
updated_metadata. This is easier than (1), but only serves non-REST catalogs. OR
   3. continue the original implementation, i.e. for every 
`commit_if_ref_exists()`, the Transaction commits to the table. This would be 
expensive IMO, but the result would remain atomic and correct, with minimal 
changes in the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to