[
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369485#comment-16369485
]
Steve Loughran commented on HADOOP-15107:
-----------------------------------------
Patch 001
See also the [latest
PDF}https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_003]
* apart from changes to logging, the only change on production code is in
Paths, which unwinds nested exceptions better
* logging: print details on committer used and warn when using
FileOutputCommitter to s3a. Some other logs reduced to debug, especially
printing stack traces during abort, as it usually an FNFE. Maybe that exception
could be treated as log @ debug, the rest log at info?
* Lots of docs update. Some minor spelling fixes, but the whole area of job
recovery is explored in detail
Regarding the correctness of committers
# I cannot prove their correctness in the absence of a specification of the
store which I can use to test against. Yes, there is that draft TLA+ of mine,
but ...
# Also, it would need skill in correctness proofs, which is now beyond me
I have in the paper gone through what I do believe the requirements for
correctness are, and include a justification of why the v2 commit algorithm
does not meet these requirements as far as both MRv2 and Spark expect.
Specifically, if a task attempt fails during commit, the state of the output
dir is unknown. It is not safe to commit a second attempt at that task
The S3A committers do not suffer this flaw, as they only commit their work to
the dest dir in job commit, which is declared as non-repeatable. The Staging
committer does rely on the FileOutputCommitter to commit its pendingset files
to HDFS, but as it uses the v1 commit algorithm, it gets the commit semantics
of that as applied to HDFS.
> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15107-001.patch
>
>
> I'm writing about the paper on the committers, one which, being a proper
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset
> lists from committed tasks to the final destination, where they are read and
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]