[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

Jason Lowe (JIRA) Wed, 29 Nov 2017 09:59:10 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated HADOOP-15059:
--------------------------------
    Attachment: HADOOP-15059.004.patch

Thanks for joining the conversation, Allen, and for pointing out the 
motivations behind the protobuf change.  Do you know of existing use cases that 
are relying on the new format?

I completely agree the new format is a great path forward for extensibility and 
portability, but unfortunately it breaks a number of existing use cases.

bq. Let's be clear: this is only a problem if one has a bundled 
hadoop-common.jar.

It's also important to point out that this is a rather common occurrence.  
Besides the typical habit of users running their *-with-dependencies.jar on the 
cluster, anyone leveraging the framework-on-HDFS approach will be bitten by 
this as soon as the nodemanager upgrades.  

Having frameworks deploy via HDFS rather than picking them up from the 
nodemanager's jars has proven to be a very useful way to better isolate apps 
during cluster rolling upgrades and support multiple versions of the framework 
on the cluster simultaneously.

bq. Is the end result of this JIRA going to be that all file formats are locked 
forever, regardless of where they come from?

I don't think so.  As discussed above, we should be able to remove support for 
the Writable format when Hadoop no longer supports 2.x apps.  Yes, that's 
likely quite a long time, but it does not have to be forever.

bq. Hadoop releases have broken rolling upgrade (and non-rolling upgrades, for 
that matter) in the middle of the 2.x stream before by removing things such as 
container execution types.

We've completed rolling upgrades across all of our clusters for every minor 
release of 2.x since rolling upgrades were first supported in 2.6, so we must 
not have hit this landmine.  Was this the removal of the dedicated Docker 
container executor in favor of the unified Linux executor that does everything?

I'm attaching a patch that implements the "bridge release(s)" approach where 
the code supports reading the new format but will write the old format by 
default.  Code can still request the new format explicitly if necessary.  The 
main drawback is that we don't get to easily leverage the benefits of the new 
format since it's not the default format.  However I'm hoping native services 
and other things that need the new protobuf format can leverage dtutil to 
translate the credentials format for easier consumption.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
>       at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>       at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>       at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>       ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>       at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>       ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

Reply via email to