DerGut commented on code in PR #862: URL: https://github.com/apache/iceberg-rust/pull/862#discussion_r2100988445
########## crates/catalog/sql/src/catalog.rs: ########## @@ -769,23 +763,94 @@ impl Catalog for SqlCatalog { Ok(()) } - async fn update_table(&self, _commit: TableCommit) -> Result<Table> { - Err(Error::new( - ErrorKind::FeatureUnsupported, - "Updating a table is not supported yet", - )) + async fn update_table(&self, mut commit: TableCommit) -> Result<Table> { + let identifier = commit.identifier().clone(); + if !self.table_exists(&identifier).await? { + return no_such_table_err(&identifier); + } + + let requirements = commit.take_requirements(); + let table_updates = commit.take_updates(); + + let table = self.load_table(&identifier).await?; + let mut update_table_metadata_builder = + TableMetadataBuilder::new_from_metadata(table.metadata().clone(), None); + + for table_update in table_updates { + update_table_metadata_builder = table_update.apply(update_table_metadata_builder)?; + } + + for table_requirement in requirements { Review Comment: I believe it would also make sense to explicitly set a transaction isolation level of _repeatable read_. Postgres for example, defaults to _read committed_ which can similarly get us into a broken table state: _read committed_ allows us to see different versions of the same row between the SELECT statement (that we use to validate the commit requirements) and the UPDATE statement. Effectively, a concurrently running conflicting update operation that commits between SELECT and UPDATE will still allow our UPDATE to succeed. We were not able to re-check the new table requirements but only checked the old ones -> we end up in a broken state. With _repeatable read_ on the other hand, the UPDATE will safely fail with a serialization error. ########## crates/catalog/sql/src/catalog.rs: ########## @@ -769,23 +763,94 @@ impl Catalog for SqlCatalog { Ok(()) } - async fn update_table(&self, _commit: TableCommit) -> Result<Table> { - Err(Error::new( - ErrorKind::FeatureUnsupported, - "Updating a table is not supported yet", - )) + async fn update_table(&self, mut commit: TableCommit) -> Result<Table> { + let identifier = commit.identifier().clone(); + if !self.table_exists(&identifier).await? { + return no_such_table_err(&identifier); + } + + let requirements = commit.take_requirements(); + let table_updates = commit.take_updates(); + + let table = self.load_table(&identifier).await?; + let mut update_table_metadata_builder = + TableMetadataBuilder::new_from_metadata(table.metadata().clone(), None); + + for table_update in table_updates { + update_table_metadata_builder = table_update.apply(update_table_metadata_builder)?; + } + + for table_requirement in requirements { Review Comment: I believe it would also make sense to explicitly set a transaction isolation level of _repeatable read_. Postgres for example, defaults to _read committed_ which can similarly get us into a broken table state: _read committed_ allows us to see different versions of the same row between the SELECT statement (that we use to validate the commit requirements) and the UPDATE statement. Effectively, a concurrently running conflicting update operation that commits between SELECT and UPDATE will still allow our UPDATE to succeed. We were not able to re-check the new table requirements but only checked the old ones -> we end up in a broken state. With _repeatable read_ on the other hand, the UPDATE should safely fail with a serialization error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org