[ 
https://issues.apache.org/jira/browse/HADOOP-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502200#comment-16502200
 ] 

Steve Loughran commented on HADOOP-15421:
-----------------------------------------

specific contents

* version marker; deser fails if missing/wrong. 
* timestamp for people and for machines
* who did the commit, where, what
* the committer name.  Maybe we should also add a full classname
* the list of files created, looks like a path with the URL schema missing. 
That's probably a bug; fix it and my spark cloud tests will fail until I update 
that code, presumably.

* metrics grabbed from the job committer. I've tried to aggregate there. For MR 
jobs, the aggregation works for all metrics where adding makes sense. For Spark 
it doesn't, because worker threads all share metrics. Really we want FS 
statistics on a thread-by-thread basis. Bear in mind, however, that this stats 
gathering was primarily because neither spark nor MR collect these things; if 
they did then anything done in the success file is irrelevant



> Stabilise/formalise the JSON _SUCCESS format used in the S3A committers
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-15421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> the S3A committers rely on an atomic PUT to save a JSON summary of the job to 
> the dest FS, containing files, statistics, etc. This is for internal testing, 
> but it turns out to be useful for spark integration testing, Hive, etc.
> IBM's stocator also generated a manifest.
> Proposed: come up with (an extensible) design that we are happy with as a 
> long lived format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to