Re: [PR] feat: sql catalog support update table [iceberg-rust]

via GitHub Wed, 21 May 2025 12:48:44 -0700


DerGut commented on code in PR #862:
URL: https://github.com/apache/iceberg-rust/pull/862#discussion_r2100988445



##########
crates/catalog/sql/src/catalog.rs:
##########
@@ -769,23 +763,94 @@ impl Catalog for SqlCatalog {
         Ok(())
     }
 
-    async fn update_table(&self, _commit: TableCommit) -> Result<Table> {
-        Err(Error::new(
-            ErrorKind::FeatureUnsupported,
-            "Updating a table is not supported yet",
-        ))
+    async fn update_table(&self, mut commit: TableCommit) -> Result<Table> {
+        let identifier = commit.identifier().clone();
+        if !self.table_exists(&identifier).await? {
+            return no_such_table_err(&identifier);
+        }
+
+        let requirements = commit.take_requirements();
+        let table_updates = commit.take_updates();
+
+        let table = self.load_table(&identifier).await?;
+        let mut update_table_metadata_builder =
+            TableMetadataBuilder::new_from_metadata(table.metadata().clone(), 
None);
+
+        for table_update in table_updates {
+            update_table_metadata_builder = 
table_update.apply(update_table_metadata_builder)?;
+        }
+
+        for table_requirement in requirements {

Review Comment:
   I believe it would also make sense to explicitly set a transaction isolation 
level of _repeatable read_. Postgres for example, defaults to _read committed_ 
which can similarly get us into a broken table state:
   
   _read committed_ allows us to see different versions of the same row between 
the SELECT statement (that we use to validate the commit requirements) and the 
UPDATE statement. Effectively, a concurrently running conflicting update 
operation that commits between SELECT and UPDATE will still allow our UPDATE to 
succeed. We were not able to re-check the new table requirements but only 
checked the old ones -> we end up in a broken state.
   
   With _repeatable read_ on the other hand, the UPDATE will safely fail with a 
serialization error.



##########
crates/catalog/sql/src/catalog.rs:
##########
@@ -769,23 +763,94 @@ impl Catalog for SqlCatalog {
         Ok(())
     }
 
-    async fn update_table(&self, _commit: TableCommit) -> Result<Table> {
-        Err(Error::new(
-            ErrorKind::FeatureUnsupported,
-            "Updating a table is not supported yet",
-        ))
+    async fn update_table(&self, mut commit: TableCommit) -> Result<Table> {
+        let identifier = commit.identifier().clone();
+        if !self.table_exists(&identifier).await? {
+            return no_such_table_err(&identifier);
+        }
+
+        let requirements = commit.take_requirements();
+        let table_updates = commit.take_updates();
+
+        let table = self.load_table(&identifier).await?;
+        let mut update_table_metadata_builder =
+            TableMetadataBuilder::new_from_metadata(table.metadata().clone(), 
None);
+
+        for table_update in table_updates {
+            update_table_metadata_builder = 
table_update.apply(update_table_metadata_builder)?;
+        }
+
+        for table_requirement in requirements {

Review Comment:
   I believe it would also make sense to explicitly set a transaction isolation 
level of _repeatable read_. Postgres for example, defaults to _read committed_ 
which can similarly get us into a broken table state:
   
   _read committed_ allows us to see different versions of the same row between 
the SELECT statement (that we use to validate the commit requirements) and the 
UPDATE statement. Effectively, a concurrently running conflicting update 
operation that commits between SELECT and UPDATE will still allow our UPDATE to 
succeed. We were not able to re-check the new table requirements but only 
checked the old ones -> we end up in a broken state.
   
   With _repeatable read_ on the other hand, the UPDATE should safely fail with 
a serialization error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: sql catalog support update table [iceberg-rust]

Reply via email to