zstraw commented on issue #4550: URL: https://github.com/apache/iceberg/issues/4550#issuecomment-1338908066
After deeping into iceberg code and the log, I can reproduce it in debugging locally. The scenario may happens in the process of Flink cancelling. 1. IcebergFileCommitter is going to commit file. In the step of **rename** metadata.json(org.apache.iceberg.hadoop.HadoopTableOperations#renameToFinal), org.apache.hadoop.ipc.Client.call encounters **InterruptedIOException**. I suspect it comes from Flink task cancelling. On the other hand, **Hdfs has renamed the metada.json file sucessfully**. 2. After rename fails, it's supposed to retry. But the thread encounters InterruptedException in sleeping(org.apache.iceberg.util.Tasks#runTaskWithRetry). Then it will throw a RuntimeException. And the version-hint will not be updated. 3. The RuntimeException leads to **rollback** in org.apache.iceberg.BaseTransaction(#cleanUpOnCommitFailure), which will delete manifest list (snap-XXX). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org