[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Steve Loughran (JIRA) Mon, 19 Feb 2018 12:28:24 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369485#comment-16369485
 ]


Steve Loughran commented on HADOOP-15107:
-----------------------------------------

Patch 001

See also the [latest 
PDF}https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_003]

* apart from changes to logging, the only change on production code is in 
Paths, which unwinds nested exceptions better
* logging: print details on committer used and warn when using 
FileOutputCommitter to s3a. Some other logs reduced to debug, especially 
printing stack traces during abort, as it usually an FNFE. Maybe that exception 
could be treated as log @ debug, the rest log at info?
* Lots of docs update. Some minor spelling fixes, but the whole area of job 
recovery is explored in detail

Regarding the correctness of committers

# I cannot prove their correctness in the absence of a specification of the 
store which I can use to test against. Yes, there is that draft TLA+ of mine, 
but ...
# Also, it would need skill in correctness proofs, which is now beyond me

I have in the paper gone through what I do believe the requirements for 
correctness are, and include a justification of why the v2 commit algorithm 
does not meet these requirements as far as both MRv2 and Spark expect. 
Specifically, if a task attempt fails during commit, the state of the output 
dir is unknown. It is not safe to commit a second attempt at that task

The S3A committers do not suffer this flaw, as they only commit their work to 
the dest dir in job commit, which is declared as non-repeatable. The Staging 
committer does rely on the FileOutputCommitter to commit its pendingset files 
to HDFS, but as it uses the v1 commit algorithm, it gets the commit semantics 
of that as applied to HDFS. 

> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15107-001.patch
>
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

Reply via email to