[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345741#comment-14345741
 ] 

Steve Loughran commented on HADOOP-11656:
-----------------------------------------

I hadn't come across the ApplicationClassloader/HADOOP-10893 myself: It''s 
**a** classloader; I'd actually have to spend time using it to be confident it 
worked.

As Vinod noted, YARN apps can bundle up their entire tarball of dependencies, 
client-side and have them deployed in-cluster, picking up the core-site.xml & c 
from the live cluster by setting up their classpath of the AM right. This is 
what we do in slider to ensure that there are no signature-compatibility 
problems between our AM code and the hadoop JARs. We don't isolate classpaths 
though, & make sure that we are consistent with the dependency versions of the 
AM. I think if hadoop was OSGi-ready, then we'd consider deploying in an OSGI 
runtime like felix.

One troublespot, even with that tactic, is shown by HADOOP-11064: 
"UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to NativeCRC32 
method changes". Changes in the internal JNI bindings meant that no hadoop-2.4 
app (like HBase) would run in a Hadoop 2.6-alpha cluster. We were lucky that I 
got to find that before 2.6 shipped, otherwise we'd have a lot of complaints. 
The problem here is that even with HBase  isolated on classpath, it was picking 
up the hadoop-native binaries from somewhere on PATH/LIB or whatever, and so 
failing to link. 

Classloader isolation & shading isn't going to be sufficient here. HADOOP-11127 
proposes some versioning, which will help —but I don't think it will let us 
load >1 hadoop.lib into a JVM. As a result, the only version of 
hadoop-common.jar which can be reliably loaded into a process is the one that 
is in sync with the version of the native library on the target machine.

> Classpath isolation for downstream clients
> ------------------------------------------
>
>                 Key: HADOOP-11656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11656
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>              Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to