[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Steve Loughran (JIRA) Mon, 02 Mar 2015 12:02:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343673#comment-14343673
 ]


Steve Loughran commented on HADOOP-11656:
-----------------------------------------

I'm not trying to stop this work, I do agree that it needs fixing, just 
wondering how to do this in a way which has (a) tangible immediate benefits in 
2015 (b) keeps Hadoop 3.x a low-cost, low-risk update, not a Perl 6 or python 3.

Maybe there are multiple strategies to take here, short term and long term

Short term (2.x)
# Hadoop works across all shipping guava versions, so update it in 2.8 (giving 
a warning in 2.7 that this is the last)
# get the OSGI patches in, so that anyone who wants to use Hadoop 2.x code 
within an OSGi-enabled JVM, can.

Longer term (3.x)

# split client/server artifacts with a leaner client (which can still use 
guava, protobuf, SLF4J &c), just strip out the pure-server side stuff from 
HDFS, so at least introduce less there.
# maybe a pure-REST client built on Jersey (and its dependencies), supporting 
SPNEGO authed interaction with WebHDFS, YARN, other apps. This will 
underperform compared to in-cluster HDFS apps, but should be sufficient for 
remote interaction.
# classpath isolation as proposed here (somehow)



> Classpath isolation for downstream clients
> ------------------------------------------
>
>                 Key: HADOOP-11656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11656
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>              Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Reply via email to