dramaticlly commented on PR #11597:
URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2489434569

   > Quick question: Is this a behavioral change? Previously we failed when the 
metadata was corrupt. After this, we succeed.
   > 
   > How do we handle corrupt metadata in other catalog implementations?
   
   Thank you @pvary I think this indeed introduce a behavioral change. Majority 
of existing catalogs (except ECSCatalog) rely on this default implementation in 
Catalog interface where we tried to load the table first and return true if 
load is successful. I believe table exists here imply 2 things where both table 
entry exist in catalog as well as latest table metadata.json is not corrupted. 
   
   Personally I think we can focus on former only and here's my thought process
   
   There are roughly 3 places where make sure of `catalog.tableExists` was used 
in iceberg code base
   1. Check before table can be registered  
https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L82
 
     - I believe behaviour change is allowed here as long as entry exist in 
catalog, register shall fail regardless files is corrupted
   2. Check before table stage creation: 
https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L228
     - I believe behaviour change is also allowed here as long as entry exist 
in catalog, stage creation shall fail regardless version files is corrupted
   3. REST API to check for table existence: 
https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/open-api/rest-catalog-open-api.yaml#L1129-L1133
     - I think this is what I originally hoped for to optimize on, to speed up 
on the existence check without rely on reading metadata first. The reason is 
that sometimes existence check is all we need without subsequent load table call
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to