chinmay-bhat commented on code in PR #758:
URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1641646010


##########
pyiceberg/table/__init__.py:
##########
@@ -2010,6 +2016,84 @@ def create_branch(
         self._requirements += requirement
         return self
 
+    def rollback_to_snapshot(self, snapshot_id: int) -> ManageSnapshots:
+        """Rollback the table to the given snapshot id.
+
+         The snapshot needs to be an ancestor of the current table state.
+
+        Args:
+            snapshot_id (int): rollback to this snapshot_id that used to be 
current.
+        Returns:
+            This for method chaining
+        """
+        self._commit_if_ref_updates_exist()
+        if self._transaction._table.snapshot_by_id(snapshot_id) is None:
+            raise ValidationError(f"Cannot roll back to unknown snapshot id: 
{snapshot_id}")
+        if snapshot_id not in {
+            ancestor.snapshot_id
+            for ancestor in 
ancestors_of(self._transaction._table.current_snapshot(), 
self._transaction.table_metadata)
+        }:
+            raise ValidationError(f"Cannot roll back to snapshot, not an 
ancestor of the current state: {snapshot_id}")
+
+        update, requirement = 
self._transaction._set_ref_snapshot(snapshot_id=snapshot_id, ref_name="main", 
type="branch")
+        self._updates += update
+        self._requirements += requirement
+        return self
+
+    def rollback_to_timestamp(self, timestamp: int) -> ManageSnapshots:
+        """Rollback the table to the snapshot right before the given timestamp.
+
+        The snapshot needs to be an ancestor of the current table state.
+
+        Args:
+            timestamp (int): rollback to the snapshot that used to be current 
right before this timestamp.
+        Returns:
+            This for method chaining
+        """
+        self._commit_if_ref_updates_exist()
+        if (
+            snapshot := ancestor_right_before_timestamp(
+                self._transaction._table.current_snapshot(), 
self._transaction.table_metadata, timestamp

Review Comment:
   @HonahX We had previously discussed and agreed to use `tbl.history()` in 
[this PR 
comment](https://github.com/apache/iceberg-python/pull/748#discussion_r1616606353),
 instead of creating a fn like `find_latest_ancestor_older_than_timestamp()`.
   
   > Also, how about implementing this by iterating over table.history()? The 
snapshot_log field in metadata contains a list of snapshot_id + timestamp pair 
so we do not need to re-generate the ancestors for current snapshot.
   
   I just realized we can't do that since `tbl.history()` doesn't always 
contain ONLY ancestors of the **_current table state_**. 
   
   For reference, see this `tbl.history()` example in the 
[docs](https://iceberg.apache.org/docs/nightly/spark-queries/?h=ancestor#history).
 There's 2 snapshots that have the same parent but one is _**NOT**_ an ancestor 
of the current table state. If we used `tbl.history()`, it would be possible to 
rollback to the non-ancestor snapshot, which should not be allowed by 
`rollback()`. So we do need to re-generate ancestors for the current snapshot, 
and I've written `ancestor_right_before_timestamp()` to do just that.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to