[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Fri, 27 Jan 2017 10:47:05 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-002.patch

Patch 002

This patch
# defines the mechanism for using multipart uploads for commits
# implements it
# adds tests
# has tests working, including a scale one.

It has not wired this up to MRv1/v2 output committers & FileOutputFormat, 
though the changes have been made to the MRv2 code to make it possible to place 
the S3A committer in behind FileOutputFormat.

What works, then? Dela

# create a file with a path {{/example/__pending/job1/task1/part-001.bin}}
# this will initiate an MPU to {{/example/part001.bin}}, to which all the 
output goes.
# when the output stream is closed, the file 
{{/example/__pending/job1/task1/part-001.bin.pending}} is created, which saves 
everything needed to commit the job
# later a {{FileCommitActions}} class can be created. 
# {{FileCommitActions .commitAllPendingFilesInPath()}} will scan a dir for 
.pending entries, and commit them one by one, here 
{{commitAllPendingFilesInPath("/example/__pending/job1/task1/")}}
# ..which causes the file  {{/example/part001.bin}} to come into existence.
# or, if you call {{abortAllPendingFilesInPath(...)}} the MPUs are read and 
aborted.


Performance? < 1s to commit a single 128MB file over a long-haul link.
{code}
Duration of time to commit 
s3a://hwdev-steve-frankfurt-new/tests3ascale/scale/commit/__pending/job_001/commit.bin.pending:
 688,701,514 nS
{code}

I think that's pretty good :)

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to