[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Mon, 06 Mar 2017 05:23:19 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-007.patch

Patch 007; (limited) tests now working.

I'm changing direction slightly here and working on making the first committer 
a derivative of the [Netflix Committer|https://github.com/rdblue/s3committer]. 
This stages to the local filesystem, then, in task commit, uploads the 
generated files as the multipart PUT; co-ordination information is persisted 
via HDFS. While this appears to add some complexity to the writing process, it 
avoids "magic" in the filesystem, and, by using HDFS, doesn't need dynamo DB.

What it also adds is: actual use in production, along with minicluster tests. 
Production use is going to mean that resilience to failures and odd execution 
orderings are more likely to have been addressed; with my own committer I'd be 
relearning how things fail.

Accordingly, I think it'd be more likely to be ready for use.

Patch 007 doesn't include any of that, it's the "before" patch. 

I'm now merging in the netflix code, using S3A and the WriteOperationHelper as 
the means of talking to S3. Their code is ASF licensed, but the copyright 
headers still say Netflix...we need it to be added to this JIRA as a patch 
before we could think about committing to the ASF codebase. In the meantime, 
I'll work on it locally

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to