[
https://issues.apache.org/jira/browse/HADOOP-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234612#comment-17234612
]
Asier Arambarri Beldarrain edited comment on HADOOP-14661 at 11/18/20, 1:25 PM:
--------------------------------------------------------------------------------
This is the solution for my case, regarding this error with S3A and Spark (but
I believe it can be replicated in other envs).
The key is the *fs.s3a.s3.client.factory.impl* properties' value. By default,
this value is set to *DefaultS3ClientFactory.*
So what's wrong with this factory? Well, it doesn't include any requester-pays
related header, as seen in its source code:
{color:#ff0000}[https://github.com/apache/hadoop/blob/e02b102/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java].]{color}
{color:#172b4d}The solution is to implement a custom client factory extending
the default one but *adding the request-payer header* in the awsConfig{color}.
{color:#172b4d} ~public class RequestPayerS3ClientFactory extends
DefaultS3ClientFactory {~
~@Override~
~protected AmazonS3 newAmazonS3Client(AWSCredentialsProvider
credentials, ClientConfiguration awsConf)~
~{~
~awsConf.addHeader("x-amz-request-payer","requester");~
~return new AmazonS3Client(credentials, awsConf);~{color}
}
}
_This is just a simplification, as you could also check if the value is set to
true before adding the header. This factory asssumes all request will be payed
by the requester if needed._
{color:#172b4d}Once you *compile the class and add it to your classpath*, set
the new hadoopConfiguration:{color}
*{color:#172b4d}~spark.sparkContext{color}.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl",
"your.package.~~RequestPayerS3ClientFactory~ ~")~*
This way the S3 requests will call the overriden newAmazonS3Client method, now
including the x-amz-request-payer header.
was (Author: asierarambarribeldarrain):
This is the solution for my case, regarding this error with S3A and Spark (but
I believe it can be replicated in other envs).
The key is the *fs.s3a.s3.client.factory.impl* properties' value. By default,
this value is set to *DefaultS3ClientFactory.*
So what's wrong with this factory? Well, it doesn't include any requester-pays
related header, as seen in its source code:
{color:#FF0000}[https://github.com/apache/hadoop/blob/e02b102/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java|https://github.com/apache/hadoop/blob/e02b102/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java].]{color}
{color:#172b4d}The solution is to implement a custom client factory extending
the default one but {color:#de350b}*adding the request-payer header*
{color:#172b4d}in the awsConfig{color}{color}.{color}
{color:#172b4d} ~public class RequestPayerS3ClientFactory extends
DefaultS3ClientFactory {~
~@Override~
~protected AmazonS3 newAmazonS3Client(AWSCredentialsProvider
credentials, ClientConfiguration awsConf)~
~{~
~awsConf.addHeader("x-amz-request-payer","requester");~
~return new AmazonS3Client(credentials, awsConf);~{color}
~{color:#172b4d}}
}{color}~
_{color:#172b4d}This is just a simplification, as you could also check if the
value is set to true before adding the header. This factory asssumes all
request will be payed by the requester if needed.{color}_
{color:#172b4d}Once you *compile the class and add it to your classpath*, set
the new hadoopConfiguration:{color}
*{color:#172b4d}~{color:#57d9a3}spark.sparkContext{color}.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl",
"your.package.~~RequestPayerS3ClientFactory~ ~")~{color}*
{color:#172b4d}This way the S3 requests will call the overriden
newAmazonS3Client method, now including the x-amz-request-payer header.{color}
> S3A to support Requester Pays Buckets
> -------------------------------------
>
> Key: HADOOP-14661
> URL: https://issues.apache.org/jira/browse/HADOOP-14661
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: common, util
> Affects Versions: 3.0.0-alpha3
> Reporter: Mandus Momberg
> Assignee: Mandus Momberg
> Priority: Minor
> Attachments: HADOOP-14661.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> Amazon S3 has the ability to charge the requester for the cost of accessing
> S3. This is called Requester Pays Buckets.
> In order to access these buckets, each request needs to be signed with a
> specific header.
> http://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]