[PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

via GitHub Mon, 18 Dec 2023 03:58:09 -0800


BsoBird opened a new pull request, #9333:
URL: https://github.com/apache/iceberg/pull/9333


   We found that under some boundary conditions, HadoopCatalogTable will suffer 
from data file loss.
   The problem is in BaseFileRewriteAction::doReplace.
   When an exception is caught, if it is an unrecognised exception, Spark will 
clean up the data file for this commit.(only datafile,not have metafile)
   But if we use HadoopCatalog, its commit method is not atomic.
   HadoopCatalog's commit method needs to operate the file in hdfs several 
times, if in the process of operating the file, due to other DAG triggered OOM, 
and it happens that Iceberg's metadata has been submitted, but the entire 
commit method is not finished. The call to the commit method will throw a 
RunTimeException. Caught by the doReplace method. Triggers a cleanup of the 
data file.
   The end result is that the metafile is written, but the corresponding 
datafile is cleaned up. 
   
   I've changed the logic of commit so that when the versionhint is modified, 
if there is an exception at that point, we will skip the data cleanup session.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

Reply via email to