puchengy commented on code in PR #12818: URL: https://github.com/apache/iceberg/pull/12818#discussion_r2071779023
########## core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java: ########## @@ -155,13 +158,21 @@ public void commit(TableMetadata base, TableMetadata metadata) { // the error handler will throw necessary exceptions like CommitFailedException and // UnknownCommitStateException // TODO: ensure that the HTTP client lib passes HTTP client errors to the error handler - LoadTableResponse response = - client.post(path, request, LoadTableResponse.class, headers, errorHandler); + try { + LoadTableResponse response = + client.post(path, request, LoadTableResponse.class, headers, errorHandler); + // all future commits should be simple commits + this.updateType = UpdateType.SIMPLE; - // all future commits should be simple commits - this.updateType = UpdateType.SIMPLE; - - updateCurrentMetadata(response); + updateCurrentMetadata(response); + } catch (RESTException e) { + if (e.getCause() != null && e.getCause() instanceof IOException) { + // any IOException or unhandled Exception should be considered as commit unknown + // so that caller can attempt to potentially reconcile + throw new CommitStateUnknownException(e); + } + throw e; + } Review Comment: @amogh-jahagirdar @singhpk234 The trigger of the deletion might not come from https://github.com/apache/iceberg/blob/61e8acecf512d8dd2a72727803e5836ea1099eed/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L472 it could be coming from SparkWrite abort cleanup (using spark 3.5 as an example, but I am using 3.2 internally) https://github.com/apache/iceberg/blob/61e8acecf512d8dd2a72727803e5836ea1099eed/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L293 I found this might be possible as for one failed instance that I encountered (https://github.com/apache/iceberg/issues/12792#issuecomment-2847442190), I saw these logs next to the failure message ``` 25/04/29 23:20:40 ERROR [Thread-5] v2.OverwriteByExpressionExec: Data source write support IcebergBatchWrite(table=spark_catalog.ad.ads_hourly_onsite_insertion_data, format=PARQUET) is aborting. 25/04/29 23:20:45 INFO [Thread-5] source.SparkCleanupUtil: Deleted 49152 file(s) using bulk deletes (job abort) 25/04/29 23:20:45 ERROR [Thread-5] v2.OverwriteByExpressionExec: Data source write support IcebergBatchWrite(table=spark_catalog.ad.ads_hourly_onsite_insertion_data, format=PARQUET) aborted. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org