bharos commented on PR #10675:
URL: https://github.com/apache/gravitino/pull/10675#issuecomment-4277155901

   @FANNG1 I agree with your thoughts. Two main concerns now are:
   
   **1. Atomicity limitation**
    The core challenge is that TableOperations.commit(base, updated) is a 
per-table CAS where each call independently swaps one table's metadata_location 
pointer. For JDBC-backed catalogs, this is a single UPDATE ... SET 
metadata_location = ? WHERE metadata_location = ? on its own DB connection 
(JdbcUtil.update()). There's no way to batch N TableOperations.commit() calls 
into one atomic operation through the existing interface — Iceberg's own [test 
adapter has the same 
limitation](https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java#L554-L556)
 and explicitly documents it.
   
   To achieve true atomicity, Gravitino needs to bypass TableOperations for the 
pointer-swap step and batch the updates directly in the backing store. For 
JDBC-backed catalogs, that means running all pointer CAS updates in a single DB 
transaction:
   
   ```
   Phase 1 (validate): unchanged — validate all requirements, write metadata 
files
   Phase 2 (atomic swap):
     BEGIN;
       for each table:
         UPDATE iceberg_tables
           SET metadata_location = :newLoc
           WHERE catalog_name = :cat AND table_namespace = :ns
             AND table_name = :name AND metadata_location = :oldLoc;
         -- if affected_rows != 1 → ROLLBACK (concurrent modification)
     COMMIT;  -- all pointers flip together, or none do
     ```
   If Phase 2 rolls back, the metadata files written in Phase 1 become orphaned 
on object storage, but this is the same as any failed single-table commit in 
Iceberg — unreferenced metadata files are cleaned up by normal table 
maintenance (RemoveOrphanFiles).
   
   The key requirement is that the catalog server has direct access to the 
backing store's transaction primitives, rather than going through 
TableOperations.commit() which encapsulates each table's CAS independently.
   
   **2. Security escalation**
   The endpoint checks catalog-level permissions (ANY_USE_CATALOG), while 
updateTable requires table-level permissions (TABLE::OWNER || 
ANY_MODIFY_TABLE). A user with only USE_CATALOG could modify tables via 
commitTransaction that they couldn't touch via updateTable. Per-table 
authorization checks need to happen within the dispatcher chain for each table 
in the transaction.
   
   I think Multi-table atomicity deserves a design doc before implementation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to