[
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609005#comment-16609005
]
Steve Loughran commented on HADOOP-15426:
-----------------------------------------
bq. We are almost to the point where the slogan of S3A can be Never Give Up*
bq. *(until your retry policy has been exhausted)
** Until you run out of credit with AWS or your employer.
FWIW, I'm planning to conduct some "experiments" to measure the
throughput/throttling of those services which AWS never document but which S3A
implicitly users
# KMS based encrypt/decrypt. Big question here: if every GET on a random IO S3
stream triggers a new KMS read, how does throughput downgrade on many clients
doing random IO with/without KMS-encryption.
# Session/assume role: what are the limits here? Every S3A client creation will
trigger 1 login, so how many clients can a single AWS customer create *across
all their apps*. Remember: if >1 bucket is used in a query, there's a login per
bucket.
# IAM metadata queries. Not used (yet) other than for IAM auth, but again,
it'll be throttled. There's static caching there though, so it is O(1) per
process.
I won't be making these -Pscale tests; they'll end up in my spark cloud
integration tests, so I really can make serious attempts to overload an account
across multiple EC2 VMs. Given I share that with my colleague developers, I'll
only be running those tests with some co-ordination (i.e. I'll do it while they
are asleep), or set up a special AWS a/c for load tests.
Future work. Maybe a paper "exploring AWS throttling services in Hadoop, Hive
and Spark applications"
> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch,
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch,
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch,
> HADOOP-15426-009.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen
> Shot 2018-07-27 at 14.07.38.png,
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file:
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
> The level of configured provisioned throughput for the table was exceeded.
> Consider increasing your provisioning level with the UpdateTable API.
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code:
> ProvisionedThroughputExceededException; Request ID:
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of
> configured provisioned throughput for the table was exceeded. Consider
> increasing your provisioning level with the UpdateTable API. (Service:
> AmazonDynamoDBv2; Status Code: 400; Error Code:
> ProvisionedThroughputExceededException; Request ID:
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
> at
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]