abmo-x opened a new pull request, #9998: URL: https://github.com/apache/iceberg/pull/9998
This is fix for https://github.com/apache/iceberg/issues/9997 ### Root cause s3a putObject was interrupted due to flink pipeline failure. As this interrupt is not handled and thrown as an exception, the metadata writer assumes write was successful which results in table pointing to a metadata json file that doesn't exist. #### Hadoop S3A code https://github.com/apache/hadoop/blob/0f51d2a4ec17bad754beb17048409811a151be53/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java#L636 ``` try { putObjectResult.get(); return size; } catch (InterruptedException ie) { LOG.warn("Interrupted object upload", ie); Thread.currentThread().interrupt(); return 0; } catch (ExecutionException ee) { throw extractException("regular upload", key, ee); } ``` #### Iceberg commit successful on upload interrupt ` 2024-03-06T23:06:36.160+00:00 INFO org.apache.iceberg.hive.HiveTableOperations Committed to table iceberg.default.some_table with the new metadata location s3a://bucket/prefix/default.db/some_table/metadata/52949-1e28478a-bf5e-4ec0-8d2c-......metadata.json ` But request interrupted `2024-03-06T23:06:36.024+00:00 WARN org.apache.hadoop.fs.s3a.S3ABlockOutputStream Interrupted object upload` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org