[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-037.patch

HADOOP-13786 HADOOP-14531 lambda wrapper around all production s3 calls
* all invocations of s3 calls are wrapped where appropriate, either with once() 
(which does the translation), retry() or retryUntranslated
* javadocs state retry policy; this is propagated to give callers an idea of 
what retries already
* commit tests -> java 8 lambdas too
* test json serdeser in hadoop common
* checkstyle

the error handling includes improvement to translateexception to recognise 
dynamoDB throttling and also that json parse error which means an EOF on 
response parsing (which means, as its after-execution, that non-idempotent 
calls wont retry).

The commit methods have been resilient to failures via the S3Lambda for a 
while, now that it's extended to all of them we can add methods to do fault 
injection on all operations: the retly logic in S3ARetryPolicy assumes that 
throttling (503), server error (500) and connection setup failures are always 
retryable. Therefore, if the client code is done right, you could run all the 
system tests with the injecting client set to throttle a limited percent of 
time.

Although developed in the committer, we could tease this out (along with the 
moved WriteOperationsHelper) and add it to trunk standalone. That'd reduce the 
size of the HADOOP-13786 diff, and provide a single large patch for people to 
cherry pick. Though if they want to backport to branch-2 they get to convert 
every single lambda-exp into a callable, which, even though IDEA Can automate, 
will make for uglier code than:

{code}
    S3Object object = invoke.retry(text, uri, true,
        () -> client.getObject(request));
{code}

Anyway, I plan to continue with dev & test of the error handling in the 
committer branch, which, after all, depends on resilience of all its 
operations, even in the presence of transient failures. Once its stable it'd be 
something to pull out and get in standalone

> Add S3Guard committer for zero-rename commits to S3 endpoints
> -------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: cloud-intergration-test-failure.log, 
> HADOOP-13786-036.patch, HADOOP-13786-037.patch, 
> HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, 
> HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, 
> HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, 
> HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, 
> HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, 
> HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, 
> HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, 
> HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, 
> HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, 
> HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, 
> HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, 
> HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, 
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, 
> HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, 
> HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, 
> HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, 
> objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to