[
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287437#comment-16287437
]
Steve Loughran commented on HADOOP-15107:
-----------------------------------------
I'm actually going to argue that, based on the examples of v1 and v2, failures
during job or task commit may be handled simply by failing the application;
what is important is that (a) the job becomes aware of the failure & can choose
how to react, and (b) at no point subsequently will the data of a partitioned
task become visible.
Here are my criteria for "Correct"
# Completeness of output: You get what was committed
# Exclusivity of output: And not what wasn't
# Continuity of correctness: A dead task attempt stays dead
# Consistency of the commit: Addresses S3 consistencies, somehow
# Ability to abort: A cleaned up job no longer exists
the MPU mechanism keeps output out of the destination until job commit. There
we need to demonstrate that job commit will commit everything from the
committed tasks, and nothing from the uncommitted ones. The staging committer
picks up its manifests of MPUs to commit from HDFS (the consistent store) & the
v1 commit (completeness, exclusivity, abortability). Need to look to make sure
that the V1 algorithm handles continuity though.
The magic committer relies on s3guard for its consistency; again, you only get
what was committed (& post-commit job cleanup to remove old uploads). I need to
make sure that you *only* get the committed output, even if a committer is
partitioned after being sent the commitTask() operation & before returning
done(). If all tasks PUT their manifest to a $taskId.pendingset file, then we
can be confident exactly one task attempt of that task will have committed its
work; if there is a partition then only one of the multiple tasks attempts will
commit its output.
> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset
> lists from committed tasks to the final destination, where they are read and
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]