dramaticlly commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2489434569
> Quick question: Is this a behavioral change? Previously we failed when the metadata was corrupt. After this, we succeed. > > How do we handle corrupt metadata in other catalog implementations? Thank you @pvary I think this indeed introduce a behavioral change. Majority of existing catalogs (except ECSCatalog) rely on this default implementation in Catalog interface where we tried to load the table first and return true if load is successful. I believe table exists here imply 2 things where both table entry exist in catalog as well as latest table metadata.json is not corrupted. Personally I think we can focus on former only and here's my thought process There are roughly 3 places where make sure of `catalog.tableExists` was used in iceberg code base 1. Check before table can be registered https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L82 - I believe behaviour change is allowed here as long as entry exist in catalog, register shall fail regardless files is corrupted 2. Check before table stage creation: https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L228 - I believe behaviour change is also allowed here as long as entry exist in catalog, stage creation shall fail regardless version files is corrupted 3. REST API to check for table existence: https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/open-api/rest-catalog-open-api.yaml#L1129-L1133 - I think this is what I originally hoped for to optimize on, to speed up on the existence check without rely on reading metadata first. The reason is that sometimes existence check is all we need without subsequent load table call -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org