[
https://issues.apache.org/jira/browse/HADOOP-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380245#comment-15380245
]
Xiao Chen commented on HADOOP-13381:
------------------------------------
I figured the clearest way to explain this is to put the call stack:
{noformat}
at
org.apache.hadoop.crypto.key.kms.KMSClientProvider.<init>(KMSClientProvider.java:461)
at
org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:331)
at
org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:322)
at
org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:95)
at org.apache.hadoop.util.KMSUtil.createKeyProvider(KMSUtil.java:65)
at org.apache.hadoop.hdfs.DFSUtil.createKeyProvider(DFSUtil.java:1851)
at
org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:73)
at
org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:70)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at org.apache.hadoop.hdfs.KeyProviderCache.get(KeyProviderCache.java:70)
at org.apache.hadoop.hdfs.DFSClient.getKeyProvider(DFSClient.java:3570)
at
org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1408)
at
org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1521)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:108)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:59)
at
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:683)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:679)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:679)
at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:385)
at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:380)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.<init>(AggregatedLogFormat.java:379)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:246)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:456)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:421)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:386)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
So there's a {{KeyProviderCache}} which caches the KeyProvider object by the
configured URI. Meanwhile, {{KMSClientProvider}} has this {{actualUGI}} cached
for the creator UGI. This is fine for transient clients, but problematic for
long-running processes like Node Manager.
When NM impersonate the client and use client's delegation token to run MR job,
{{KMSClientProvider}} should favor client's DT, not the cached ones which may
have long been expired.
> KMS clients running in the same JVM should use updated KMS Delegation Token
> ---------------------------------------------------------------------------
>
> Key: HADOOP-13381
> URL: https://issues.apache.org/jira/browse/HADOOP-13381
> Project: Hadoop Common
> Issue Type: Bug
> Components: kms
> Affects Versions: 2.6.0
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Priority: Critical
>
> When {{/tmp}} is setup as an EZ, one may experience YARN log aggregation
> failure after the KMS token is expired. The MR job itself runs find though.
> When this happens, YARN NodeManager's log will show
> {{AuthenticationException}} with token is expire / token can't be found in
> cache, depending on whether the expired token is removed by the background or
> not.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]