[
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347446#comment-14347446
]
Steve Loughran commented on HADOOP-11656:
-----------------------------------------
[[email protected]], as someone downstream, I know you know the situation we
have now; everyone who goes down experiences this, with HBase and OOzie being
core pain points. Not exposing the transitive dependencies means that you can
stop worrying about what version of Guava or protobuf is used by Hadoop,
leaving only our consistent semantics to maintain.
The native lib problem will mean no more than one version of the hadoop JARs
can be reliably loaded.
Now, unless I'm confused about how classloaders bootstrap, it has to be done in
an order; classloader above classloader, with OSGi doing some magic at startup
so the first CL can pick up stuff from external CLs and make them visible to
others.
Does this mean that adoption of the new CL is a whole new startup process? as
if so, it is going to be visible to everything downstream. Now, we could design
YARN-679 to be ready for this, so if you adopt that as the launcher for your
app then you can get the CL setup in there.
But what about every single client app that wants to talk HDFS? We may be able
to go to HBase & Accumulo & say "new launcher", maybe go to spark and say "your
AM needs to do this", but it's harder to say "your general purpose code to read
off HDFS must now use our CL chain to work". Especially for the use case
"webapp running in tomcat with the Classloader isolation of Java EE".
Things like aren't going to work if we start imposing a new CL, it will need to
flip the switch to say no dependency magic.
So why is this being proposed as "on-by-default"? And, since there isn't a
clear proposal yet, are we trying to define that we should be incompatible
from the outset?
Please: give us a proposal, let's work towards an implementation, actually test
this downstream including in an Oozie version (hence tomcat tests), in-cluster
apps, and remote client apps. Then we can consider whether or not it would be
justifiable to say "you must do this to move to Hadoop 3"
Oh, and given the schedules, we should start planning for Java 9 & Jigsaw...
> Classpath isolation for downstream clients
> ------------------------------------------
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party
> libraries. As our code base grows and matures we increase the set of
> libraries we rely on. At the same time, as our user base grows we increase
> the likelihood that some downstream project will run into a conflict while
> attempting to use a different version of some library we depend on. This has
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to
> off and they don't do anything to help dependency conflicts on the driver
> side or for folks talking to HDFS directly. This should serve as an umbrella
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when
> executing user provided code, whether client side in a launcher/driver or on
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want
> to run substantially ahead or behind the versions we need and the project is
> freer to change our own dependency versions because they'll no longer be in
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases
> written in the comments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)