Found the root cause of this, and posting it here for indexable posterity:

The root of the problem is that, at least for Samza, the application master
seems to be in charge of localization. Thus, the AM needs to be able to
access the specified package source. For S3 (and I suppose other AWS
services), this requires that the login credentials be passed in as Java
system properties by adding the following configuration option to the Samza
config:

yarn.am.opts=-Daws.accessKeyId=<key> -Daws.secretKey=<secret>

The AWS SDK will pick this up and use it for authentication.

Hth,
Malcolm

On Wed, Mar 27, 2019 at 1:41 AM Malcolm McFarland <[email protected]>
wrote:
>
> Hey folks,
>
> I've been having some trouble loading a Samza app from a private S3
> bucket on a YARN node manager running in AWS Fargate. I haven't been
> able to ensure the credentials are available despite having done the
> following:
>
> - placed the correct s3a keys in core-site.xml
> - set AWS environment variables in the shell that launches `bin/yarn
> nodemanager`
> - added the AWS variables as arguments to that command
> - set a JAVA_OPTS environment variable in that same shell with the
> aws.accessKeyId and aws.secretKey properties set correctly
> - correctly added the same Java opts property string under the
> yarn.nodemanager.container-localizer.java.opts yarn-site.xml property
> - set Samza's task.opts property to use these same values (as per the
> Samza config)
>
> Even with all of the above, I'm consistently seeing this error:
>
> org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on
> <bucket_name>: com.amazonaws.AmazonClientException: No AWS Credentials
> provided by DefaultAWSCredentialsProviderChain
>   at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
>   at
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:334)
>   at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:277)
>   at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
>   ...
>   at
org.apache.samza.job.yarn.YarnClusterResourceManager.startContainer(YarnClusterResourceManager.java:617)
>   at
org.apache.samza.job.yarn.YarnClusterResourceManager.runContainer(YarnClusterResourceManager.java:578)
>   at
org.apache.samza.job.yarn.YarnClusterResourceManager.launchStreamProcessor(YarnClusterResourceManager.java:298)
>   at
org.apache.samza.clustermanager.AbstractContainerAllocator.runStreamProcessor(AbstractContainerAllocator.java:157)
>   at
org.apache.samza.clustermanager.ContainerAllocator.assignResourceRequests(ContainerAllocator.java:52)
>   ..
>   Caused by: com.amazonaws.AmazonClientException: No AWS Credentials
> provided by DefaultAWSCredentialsProviderChain :
> com.amazonaws.SdkClientException: Unable to load AWS credentials from
> any provider in the chain
>
> I'm running Hadoop 3.0.3, and this error is being produced in
> something that calls itself the Container Allocator Thread. Running
> `hadoop fs -ls s3a://<bucket_name>/` in the calling shell does work
> correctly, so I know that the key/secret is correct (making sure to
> unset any automatically-set AWS environment variables first). It looks
> to me like the environment that's localizing the package doesn't have
> access to the credentials in spite of all of these steps. Has anybody
> come across a situation like this?
>
> Cheers,
> Malcolm



-- 
Malcolm McFarland
Cavulus

Reply via email to