[
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated HADOOP-15059:
--------------------------------
Attachment: HADOOP-15059.001.patch
Attaching a patch that has the container launch process write two token files,
the legacy token format in the existing container_tokens file and the new
version 1 format in a new container_tokens-v1 file. I left the container
localizer path alone since localizers are running the same code as the
nodemanager and therefore can directly support the new v1 format as-is.
The basic idea is to tack on the "-v1" suffix to the legacy token pathname to
form the new v1 token pathname. The container launcher and container executors
both do this, so the interface between them did not have to change. The legacy
path is passed between them to indicate where both token files can be located
(once the suffix is applied to form the new token path). It's definitely not
the cleanest, but it was relatively simple to implement. I refactored some
names in the container start context to make it more clear which path is being
used.
This needs a lot more testing, but I was able to run a sleep job on a simple
security pseudo-distributed cluster and manually verified both container token
files were being written and each was the proper format. I also manually
forced the launcher to omit the new environment variable for the version 1
file, forcing the UGI to load the legacy token file, and that worked as well.
I have not had a chance yet to test the rolling-upgrade-with-tarball scenario
nor the native container-executor changes, but I thought it was far enough
along to at least get some feedback.
If others could take a look at the patch and/or take it for a test drive that
would be great.
> 3.0 deployment cannot work with old version MR tar ball which break rolling
> upgrade
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
> Issue Type: Bug
> Components: security
> Reporter: Junping Du
> Assignee: Jason Lowe
> Priority: Blocker
> Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main]
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for
> application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader:
> Unable to load native-hadoop library for your platform... using builtin-java
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main]
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> at
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
> at
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
> at
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
> at
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
> at
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> at
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
> at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
> at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
> at
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
> ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> at
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> at
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before
> we ship 3.0 otherwise all MR running applications will get stuck during/after
> upgrade.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]