zstraw commented on issue #4550:
URL: https://github.com/apache/iceberg/issues/4550#issuecomment-1338908066

   After deeping into iceberg code and the log, I can reproduce it in debugging 
locally.
   
   The scenario may happens in the process of Flink cancelling.
   1. IcebergFileCommitter is going to commit file. In the step of **rename** 
metadata.json(org.apache.iceberg.hadoop.HadoopTableOperations#renameToFinal), 
org.apache.hadoop.ipc.Client.call encounters **InterruptedIOException**. I 
suspect it comes from Flink task cancelling. On the other hand, **Hdfs has 
renamed the metada.json file sucessfully**.
   2. After rename fails, it's supposed to retry. But the thread encounters 
InterruptedException in 
sleeping(org.apache.iceberg.util.Tasks#runTaskWithRetry). Then it will throw a 
RuntimeException. And the version-hint will not be updated.
   3. The RuntimeException leads to **rollback** in 
org.apache.iceberg.BaseTransaction(#cleanUpOnCommitFailure), which will delete 
manifest list (snap-XXX).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to