[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Steve Loughran (JIRA) Tue, 12 Dec 2017 07:43:21 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287767#comment-16287767
 ]


Steve Loughran commented on HADOOP-15107:
-----------------------------------------

I think I'd like to have the option of being able to set an x-committed header 
on created files; this would declare the job/task ID at the time the MPU was 
created.

This would allow us to examine the list of files in a destination, and assert 
that all objects were created from a successful task attempt which we know of, 
not an unsuccessful one. 

We could also have the _SUMMARY file include a list of successful tasks
* add a task attempt ID to the .pending & pendingset files (the single commit 
has a task ID, but I want the set: job, task, task attempt)
* job commit to build list of committed tasks as it reads the set of 
.pendingset files.
* we can add an optional post-commit check to verify all new files have the 
header which matches their entries; that there were no files committed whose 
provenance was elsewhere.
* and of course, gives you some diagnostics when backtracking the provenance of 
stuff.

This is something which can be used in integration testing. In production, too, 
maybe.

Risks:
* leaks a bit of information
* uses up a header. Maybe:have a generic taskInfo header which can be extended 
to contain a bit more than just task attempt ID


ps: Reviewed the magic committer. Confirmed: a committed file the .pendingset 
file to the $jobAttemptDir/$taskId.pendingset with overwrite=false. So: >1 
taskattempt may commit.

With overwrite=false, its actually the first which wins; there's a tiny window 
of a risk of overlap.

Proposed: allow overwrites, so guaranteeing that the last taskAttempt to do the 
write wins.
# the most likely last task attempt to write will be the sucessful one. Reason: 
a second attempt is only committed first task attempt failed, or did not 
respond in a timely manner to a taskCommit request.
# assuming time moves forwards, no GCs, etc, task Attempt #2 will inevitably be 
invoked after attempt 1.
# if task attempt 1 had successfully committed, but not returned, then it is 
considered a failure by the job. Therefore, attempt #2 should be the one which 
succeeds

> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Reply via email to