[
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364457#comment-15364457
]
Allen Wittenauer commented on HADOOP-13335:
-------------------------------------------
bq. Not yet - until someone feels the need to have hdfs used as a separate
command to do something other than filesystem operations. (Really hope there's
no hdfs jar command)
Not on my watch. :) It's pretty clear from this JIRA just how much damage the
extra YARN stuff has caused. (and this is just the tip of the iceberg...)
Plus, between HADOOP-11485, HADOOP-12930, and a handful of other features,
there are much better ways for 1st, 2nd, and 3rd parties to integrate extra
stuff. \[1]
bq. I don't see the difference between hadoop jar and yarn jar other than a
different set of variables being set and respected by the different commands.
That's correct. But those extra vars make a world of difference . In branch-2,
HADOOP\_OPTS and YARN\_OPTS don't cross. Ever. This effectively makes the
hadoop, hdfs, and mapred entry points configured differently than yarn.
bq. Stepping back - should the YARN\_\* parameters exist?, and should yarn jar
exist? If I understand you correctly, I think you're trying to get rid of some
of this.
That's absolutely correct: none of this should exist and I've worked really
hard on either removing or hiding the complexity going forward. But this only
gets easier in trunk. It's way too late and way too hard to fix this mess in
branch-2.
bq. If 'yarn jar' is something that we think is confusing, or something we
potentially want to get rid off - I'd say it's better to not print any warning
at all - and leave hadoop jar as is?
It's going to be hard to take yarn jar or hadoop jar away. It's doubtful they
will ever get removed. That said, we can at least make them act and work the
same way. To me, that's the ultimate goal and it's pretty close to what
happens in trunk:
1. yarn command sucks in yarn-env.sh, hadoop-env.sh, yarn-config.sh and
hadoop-config.sh in a way that should be mostly conflict-free. (non-yarn
commands do not pull in yarn-x.sh, obviously)
2. If YARN\_OPTS is defined, yarn x (jar, rmadmin, etc) will use it but throw a
deprecation warning.
3. Otherwise use HADOOP\_OPTS
As folks migrate to a release based on trunk, this extra fluff will go away at
least configuration-wise.
Eventually, when we can remove support for all of these deprecated vars, this
will reduce code complexity and gives us (effectively) one code path to test.
But bear in mind that we're years away before YARN\_OPTS and friends disappear.
It's been 5+ years since our last trunk release. I'll likely be dead by the
time 4.x comes out and these useless YARN\_\* vars can get culled officially.
But the work has to start now.
bq. The hive binary could unset YARN_OPTS / YARN_CLIENT_OPTS - and leave them
intact for the session/shell from where the hive binary was invoked.
Which, again, if hive wants to stick to using hadoop jar, this would be my
advice. \[2] Just keep in mind this also means that any user settings that
they might have wanted to apply to their YARN environment will not kick in.
It's no different than what is happening today, but it may not reflect what
users want. Thus we're back to why the warnings went in.
\[1] I'm not saying the community would do this, but let's use hive as an
example here of how much more powerful things are in trunk. With HADOOP-12930,
it's now possible for hive to add a 'hadoop hive' command or a (even more
outrageous!) 'hdfs hivefs' command. Rather than integrate *outside* the
framework, one could integrate *inside* and pull exactly the information
required. This will hopefully put 'hadoop jar' and 'yarn jar' on life support.
We'll keep them around but it's going to be much more desirable to just
integrate directly. The new 'mapred streaming' command is a great example
here: why make users call hadoop jar with some weird version number when it's
now trivial to dynamically add commands at build or configure time?
\[2] Very long term (post-3.x), it would probably be better if hive called
hadoop-config.sh and/or hadoop-functions.sh directly. This would bypass the
middleman and give much better control. I'd be very interested to hear what
sort of holes we have in the functionality here that makes this
hard/impossible. Off the top, I suspect we need to make one big function of the
series of function calls in hadoop-config.sh, but would love to hear your
insight on this.
> Add an option to suppress the 'use yarn jar' warning or remove it
> -----------------------------------------------------------------
>
> Key: HADOOP-13335
> URL: https://issues.apache.org/jira/browse/HADOOP-13335
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch,
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch,
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation'
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive
> uses it to start all it's services (HiveServer2, the hive client, beeline
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of
> sense - there's no relation to yarn other than requiring the classpath to
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN
> variables are set (leave it in the help message), or adding a mechanism which
> would allow users to suppress this WARNING.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]