[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Steve Loughran (JIRA) Wed, 04 Mar 2015 11:52:40 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347446#comment-14347446
 ]


Steve Loughran commented on HADOOP-11656:
-----------------------------------------

[[email protected]], as someone downstream, I know you know the situation we 
have now; everyone who goes down experiences this, with HBase and OOzie being 
core pain points. Not exposing the transitive dependencies means that you can 
stop worrying about what version of Guava or protobuf is used by Hadoop, 
leaving only our consistent semantics to maintain.

The native lib problem will mean no more than one version of the hadoop JARs 
can be reliably loaded.

Now, unless I'm confused about how classloaders bootstrap, it has to be done in 
an order; classloader above classloader, with OSGi doing some magic at startup 
so the first CL can pick up stuff from external CLs and make them visible to 
others.

Does this mean that adoption of the new CL is a whole new startup process? as 
if so, it is going to be visible to everything downstream. Now, we could design 
YARN-679 to be ready for this, so if you adopt that as the launcher for your 
app then you can get the CL setup in there.

But what about every single client app that wants to talk HDFS? We may be able 
to go to HBase & Accumulo & say "new launcher", maybe go to spark and say "your 
AM needs to do this", but it's harder to say "your general purpose code to read 
off HDFS must now use our CL chain to work". Especially for the use case 
"webapp running in tomcat with the Classloader isolation of Java EE". 

Things like aren't going to work if we start imposing a new CL, it will need to 
flip the switch to say no dependency magic. 

So why is this being proposed as "on-by-default"? And, since there isn't a 
clear proposal yet, are we trying to define that  we should be incompatible 
from the outset?

Please: give us a proposal, let's work towards an implementation, actually test 
this downstream including in an Oozie version (hence tomcat tests), in-cluster 
apps, and remote client apps. Then we can consider whether or not it would be 
justifiable to say "you must do this to move to Hadoop 3"

Oh, and given the schedules, we should start planning for Java 9 & Jigsaw...



> Classpath isolation for downstream clients
> ------------------------------------------
>
>                 Key: HADOOP-11656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11656
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>              Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Reply via email to