[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Steve Loughran (JIRA) Fri, 27 Jul 2018 10:09:33 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560016#comment-16560016
 ]


Steve Loughran commented on HADOOP-15426:
-----------------------------------------

Commentary on this patch
* automatically sets things up. Aaron's original patch did this iff the test 
bucket declared it's region; now we ask the store for that.
 * if scale is set it will try to decrease the capacity of the DDB store (if 
needed), and, if it did so, increase at the end of the tests
 * not as rigorous a test of resilience as running the whole suite shrunk down
 * the further you are from the DDB table, the less load you can put on it 
without expanding the thread count

I'm not sure that shrinking the DDB table automatically is good. 
# There's a limited # of times you can do it before you get blocked
# once that daily limit is exceeded, this test will fail because it can't 
throttle IO
# and theres a risk that you cant expand iO after the test, so get stuck with a 
cripped store
# and if the table is shared with other buckets, that breaks them all

Two options
# cut the stuff to shrink the DDB table, and skip the tests if the table is 
consider too big (>= 10?)
# make a profile -Pthrottling to test the impact of throttling: shrink the 
table, 

we might also want to make  the timeout of all store tests configurable so a 
full throttled test run will not timeout. 

> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15426
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, Screen 
> Shot 2018-07-24 at 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, 
> Screen Shot 2018-07-25 at 16.28.53.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>       at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Reply via email to