[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262809#comment-16262809
 ] 

Jason Lowe commented on HADOOP-15059:
-------------------------------------

bq. Are we going to keep binary compatibility across hadoop-2.x and hadoop-3.x?

Wire compatibility between 2.x clients and 3.x servers is a prerequisite to 
supporting a rolling upgrade from 2.x to 3.x, but I do not think everyone 
realizes wire compatibility between a 3.x client and a 2.x server is also very 
important to many of our users.  There are many cases where more than one 
cluster is involved in a workflow.  Requiring that all clusters upgrade from 
2.x to 3.x simultaneously is a huge hurdle for adoption, and most users will 
upgrade them one at a time.  As individual clusters upgrade there will be 
clients/jobs on a newly upgraded 3.x cluster trying to interact with an older 
2.x cluster.

Back to the issue of launching jobs using an incompatible token format -- 
here's a couple of options we could consider:

1) YARN nodemanager writes out *two* token credential files, the original 2.x 
file for backwards compatibility and a new 3.x file.  The 3.x UGI code looks 
for the new file and falls back to the old one if it cannot find it.  The 2.x 
code will simply load the old format from the original filename as it does 
today.

2) Application submission context contains information on which version of 
credentials to use for an application.  This gets transferred to the container 
launch context for each container, and the nodemanager writes out the 
appropriate credentials version based on what was specified in the container 
launch context.  In other words, the nodemanager knows which version of the 
credentials format the container is expecting to find and writes the token file 
in that format.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
>       at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>       at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>       ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>       ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to