Re: [PR] Add SqlCatalog _commit_table support [iceberg-python]

via GitHub Sun, 14 Jan 2024 00:23:04 -0800


HonahX commented on code in PR #265:
URL: https://github.com/apache/iceberg-python/pull/265#discussion_r1451680767



##########
pyiceberg/catalog/sql.py:
##########
@@ -268,16 +269,32 @@ def drop_table(self, identifier: Union[str, Identifier]) 
-> None:
         identifier_tuple = self.identifier_to_tuple_without_catalog(identifier)
         database_name, table_name = 
self.identifier_to_database_and_table(identifier_tuple, NoSuchTableError)
         with Session(self.engine) as session:
-            res = session.execute(
-                delete(IcebergTables).where(
-                    IcebergTables.catalog_name == self.name,
-                    IcebergTables.table_namespace == database_name,
-                    IcebergTables.table_name == table_name,
+            if self.engine.dialect.supports_sane_rowcount:
+                res = session.execute(
+                    delete(IcebergTables).where(
+                        IcebergTables.catalog_name == self.name,
+                        IcebergTables.table_namespace == database_name,
+                        IcebergTables.table_name == table_name,
+                    )
                 )
-            )
+                if res.rowcount < 1:
+                    raise NoSuchTableError(f"Table does not exist: 
{database_name}.{table_name}")
+            else:
+                try:
+                    tbl = (
+                        session.query(IcebergTables)
+                        .with_for_update(of=IcebergTables, nowait=True)

Review Comment:
   Thanks for the explanation! Reflecting further, I think that using neither 
NOWAIT nor SKIP_LOCKED might be better for maintaining consistency. Currently, 
when the engine supports accurate rowcounts, we employ UPDATE TABLE or DELETE 
TABLE, which inherently wait for locks. If the engine lacks rowcount support, 
sticking with SELECT FOR UPDATE ensures consistent behavior with UPDATE TABLE 
operations, as it waits for locks. IThis consistency eliminates the need to 
handle additional exceptions, as any concurrent transaction will naturally lead 
to a `NoResultFound` scenario once the table is dropped, renamed, or updated.
   
   If we later decide that avoiding lock waits is preferable in fallback 
situations, we could consider the following modifications:
   1. For `drop_table` and `rename_table`, I think you're right that we can 
catch `sqlalchemy.exc.OperationalError` and re-raise it as a new exception 
indicating that another process is set to delete the table. Using 
`CommitFailedException` doesn't seem appropriate here, as it's primarily meant 
for failed commits on iceberg tables.
   2. For `_commit_table`, `skip_locked` might still be useful. At the point 
when the SQL command is executed, we're assured of the table's existence. 
Therefore, encountering `NoResultFound` would directly imply a concurrent 
commit attempt.
   
   Do you think the concern about maintaining consistency between cases that do 
and do not support rowcount is valid? If not, what are your thoughts on 
adopting NOWAIT for `drop_table` and `rename_table`, and SKIP_LOCKED for 
`_commit_table`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add SqlCatalog _commit_table support [iceberg-python]

Reply via email to