BsoBird opened a new pull request, #9333: URL: https://github.com/apache/iceberg/pull/9333
We found that under some boundary conditions, HadoopCatalogTable will suffer from data file loss. The problem is in BaseFileRewriteAction::doReplace. When an exception is caught, if it is an unrecognised exception, Spark will clean up the data file for this commit.(only datafile,not have metafile) But if we use HadoopCatalog, its commit method is not atomic. HadoopCatalog's commit method needs to operate the file in hdfs several times, if in the process of operating the file, due to other DAG triggered OOM, and it happens that Iceberg's metadata has been submitted, but the entire commit method is not finished. The call to the commit method will throw a RunTimeException. Caught by the doReplace method. Triggers a cleanup of the data file. The end result is that the metafile is written, but the corresponding datafile is cleaned up. I've changed the logic of commit so that when the versionhint is modified, if there is an exception at that point, we will skip the data cleanup session. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org