[
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287767#comment-16287767
]
Steve Loughran commented on HADOOP-15107:
-----------------------------------------
I think I'd like to have the option of being able to set an x-committed header
on created files; this would declare the job/task ID at the time the MPU was
created.
This would allow us to examine the list of files in a destination, and assert
that all objects were created from a successful task attempt which we know of,
not an unsuccessful one.
We could also have the _SUMMARY file include a list of successful tasks
* add a task attempt ID to the .pending & pendingset files (the single commit
has a task ID, but I want the set: job, task, task attempt)
* job commit to build list of committed tasks as it reads the set of
.pendingset files.
* we can add an optional post-commit check to verify all new files have the
header which matches their entries; that there were no files committed whose
provenance was elsewhere.
* and of course, gives you some diagnostics when backtracking the provenance of
stuff.
This is something which can be used in integration testing. In production, too,
maybe.
Risks:
* leaks a bit of information
* uses up a header. Maybe:have a generic taskInfo header which can be extended
to contain a bit more than just task attempt ID
ps: Reviewed the magic committer. Confirmed: a committed file the .pendingset
file to the $jobAttemptDir/$taskId.pendingset with overwrite=false. So: >1
taskattempt may commit.
With overwrite=false, its actually the first which wins; there's a tiny window
of a risk of overlap.
Proposed: allow overwrites, so guaranteeing that the last taskAttempt to do the
write wins.
# the most likely last task attempt to write will be the sucessful one. Reason:
a second attempt is only committed first task attempt failed, or did not
respond in a timely manner to a taskCommit request.
# assuming time moves forwards, no GCs, etc, task Attempt #2 will inevitably be
invoked after attempt 1.
# if task attempt 1 had successfully committed, but not returned, then it is
considered a failure by the job. Therefore, attempt #2 should be the one which
succeeds
> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset
> lists from committed tasks to the final destination, where they are read and
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]