Found the root cause of this, and posting it here for indexable posterity: The root of the problem is that, at least for Samza, the application master seems to be in charge of localization. Thus, the AM needs to be able to access the specified package source. For S3 (and I suppose other AWS services), this requires that the login credentials be passed in as Java system properties by adding the following configuration option to the Samza config:
yarn.am.opts=-Daws.accessKeyId=<key> -Daws.secretKey=<secret> The AWS SDK will pick this up and use it for authentication. Hth, Malcolm On Wed, Mar 27, 2019 at 1:41 AM Malcolm McFarland <[email protected]> wrote: > > Hey folks, > > I've been having some trouble loading a Samza app from a private S3 > bucket on a YARN node manager running in AWS Fargate. I haven't been > able to ensure the credentials are available despite having done the > following: > > - placed the correct s3a keys in core-site.xml > - set AWS environment variables in the shell that launches `bin/yarn > nodemanager` > - added the AWS variables as arguments to that command > - set a JAVA_OPTS environment variable in that same shell with the > aws.accessKeyId and aws.secretKey properties set correctly > - correctly added the same Java opts property string under the > yarn.nodemanager.container-localizer.java.opts yarn-site.xml property > - set Samza's task.opts property to use these same values (as per the > Samza config) > > Even with all of the above, I'm consistently seeing this error: > > org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on > <bucket_name>: com.amazonaws.AmazonClientException: No AWS Credentials > provided by DefaultAWSCredentialsProviderChain > at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144) > at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:334) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:277) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288) > ... > at org.apache.samza.job.yarn.YarnClusterResourceManager.startContainer(YarnClusterResourceManager.java:617) > at org.apache.samza.job.yarn.YarnClusterResourceManager.runContainer(YarnClusterResourceManager.java:578) > at org.apache.samza.job.yarn.YarnClusterResourceManager.launchStreamProcessor(YarnClusterResourceManager.java:298) > at org.apache.samza.clustermanager.AbstractContainerAllocator.runStreamProcessor(AbstractContainerAllocator.java:157) > at org.apache.samza.clustermanager.ContainerAllocator.assignResourceRequests(ContainerAllocator.java:52) > .. > Caused by: com.amazonaws.AmazonClientException: No AWS Credentials > provided by DefaultAWSCredentialsProviderChain : > com.amazonaws.SdkClientException: Unable to load AWS credentials from > any provider in the chain > > I'm running Hadoop 3.0.3, and this error is being produced in > something that calls itself the Container Allocator Thread. Running > `hadoop fs -ls s3a://<bucket_name>/` in the calling shell does work > correctly, so I know that the key/secret is correct (making sure to > unset any automatically-set AWS environment variables first). It looks > to me like the environment that's localizing the package doesn't have > access to the credentials in spite of all of these steps. Has anybody > come across a situation like this? > > Cheers, > Malcolm -- Malcolm McFarland Cavulus
