[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Steve Loughran (JIRA) Tue, 12 Dec 2017 02:56:25 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287437#comment-16287437
 ]


Steve Loughran commented on HADOOP-15107:
-----------------------------------------

I'm actually going to argue that, based on the examples of v1 and v2, failures 
during job or task commit may be handled simply by failing the application; 
what is important is that (a) the job becomes aware of the failure & can choose 
how to react, and (b) at no point subsequently will the data of a partitioned 
task become visible.

Here are my criteria for "Correct"

# Completeness of output: You get what was committed
# Exclusivity of output: And not what wasn't
# Continuity of correctness: A dead task attempt stays dead
# Consistency of the commit: Addresses S3 consistencies, somehow
# Ability to abort: A cleaned up job no longer exists

the MPU mechanism keeps output out of the destination until job commit. There 
we need to demonstrate that job commit will commit everything from the 
committed tasks, and nothing from the uncommitted ones. The staging committer 
picks up its manifests of MPUs to commit from HDFS (the consistent store) & the 
v1 commit (completeness, exclusivity, abortability). Need to look to make sure 
that  the V1 algorithm handles continuity though.

The magic committer relies on s3guard for its consistency; again, you only get 
what was committed (& post-commit job cleanup to remove old uploads). I need to 
make sure that you *only* get the committed output, even if a committer is 
partitioned after being sent the commitTask() operation & before returning 
done(). If all tasks PUT their manifest to a $taskId.pendingset file, then we 
can be confident exactly one task attempt of that task will have committed its 
work; if there is a partition then only one of the multiple tasks attempts will 
commit its output.



> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Reply via email to