Fokko commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1218658583


##########
python/pyiceberg/table/__init__.py:
##########
@@ -69,21 +72,313 @@
     import ray
     from duckdb import DuckDBPyConnection
 
+    from pyiceberg.catalog import Catalog
 
 ALWAYS_TRUE = AlwaysTrue()
 
 
+class TableUpdates:
+    _table: Table
+    _updates: Tuple[TableUpdate, ...]
+    _requirements: Tuple[TableRequirement, ...]
+
+    def __init__(
+        self,
+        table: Table,
+        actions: Optional[Tuple[TableUpdate, ...]] = None,
+        requirements: Optional[Tuple[TableRequirement, ...]] = None,
+    ):
+        self._table = table
+        self._updates = actions or ()
+        self._requirements = requirements or ()
+
+    def _append_updates(self, *new_updates: TableUpdate) -> TableUpdates:
+        """Appends updates to the set of staged updates
+
+        Args:
+            *new_updates: Any new updates
+
+        Raises:
+            ValueError: When the type of update is not unique.
+
+        Returns:
+            A new AlterTable object with the new updates appended
+        """
+        for new_update in new_updates:
+            type_new_update = type(new_update)
+            if any(type(update) == type_new_update for update in 
self._updates):
+                raise ValueError(f"Updates in a single commit need to be 
unique, duplicate: {type_new_update}")

Review Comment:
   The whole idea of this check is to avoid multiple similar operations. I 
agree that when you change a schema, all the updates to the schema are 
accumulated into one `AddSchemaUpdate`. If you try to add another update to the 
transaction of the same type, it will throw the `ValueError` that we see above.
   
   The whole public API is currently:
   
   ```python
   table.new_transaction.set_table_version(2).commit()
   table.new_transaction.set_properties(**{
       "lifecycle": "true"
   }).commit()
   table.new_transaction.remove_properties("lifecycle").commit()
   table.new_transaction.update_location("s3://...").commit()
   ```
   
   And you can combine them:
   ```python
   
table.new_transaction.set_table_version(2).update_location("s3://...").commit()
   ```
   
   Coming multiple updates of identical type will raise a `ValueError`:
   ```python
   table.new_transaction.set_table_version(2).set_table_version(2).commit()
   ```
   I think this will guard us from getting into nasty situations. We can always 
relax this in the future to allow multiple snapshots, but then the requirements 
should be in order as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to