[ 
https://issues.apache.org/jira/browse/HADOOP-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364457#comment-15364457
 ] 

Allen Wittenauer commented on HADOOP-13335:
-------------------------------------------

bq. Not yet - until someone feels the need to have hdfs used as a separate 
command to do something other than filesystem operations. (Really hope there's 
no hdfs jar command)

Not on my watch. :) It's pretty clear from this JIRA just how much damage the 
extra YARN stuff has caused.  (and this is just the tip of the iceberg...)  
Plus, between HADOOP-11485, HADOOP-12930, and a handful of other features, 
there are much better ways for 1st, 2nd, and 3rd parties to integrate extra 
stuff. \[1]

bq.  I don't see the difference between hadoop jar and yarn jar other than a 
different set of variables being set and respected by the different commands. 

That's correct. But those extra vars make a world of difference .  In branch-2, 
HADOOP\_OPTS and YARN\_OPTS don't cross.  Ever.  This effectively makes the 
hadoop, hdfs, and mapred entry points configured differently than yarn.

bq. Stepping back - should the YARN\_\* parameters exist?, and should yarn jar 
exist? If I understand you correctly, I think you're trying to get rid of some 
of this.

That's absolutely correct: none of this should exist and I've worked really 
hard on either removing or hiding the complexity going forward. But this only 
gets easier in trunk.  It's way too late and way too hard to fix this mess in 
branch-2.

bq. If 'yarn jar' is something that we think is confusing, or something we 
potentially want to get rid off - I'd say it's better to not print any warning 
at all - and leave hadoop jar as is?

It's going to be hard to take yarn jar or hadoop jar away. It's doubtful they 
will ever get removed. That said, we can at least make them act and work the 
same way.  To me, that's the ultimate goal and it's pretty close to what 
happens in trunk:

1. yarn command sucks in yarn-env.sh, hadoop-env.sh, yarn-config.sh and 
hadoop-config.sh in a way that should be mostly conflict-free. (non-yarn 
commands do not pull in yarn-x.sh, obviously)
2. If YARN\_OPTS is defined, yarn x (jar, rmadmin, etc) will use it but throw a 
deprecation warning.
3. Otherwise use HADOOP\_OPTS

As folks migrate to a release based on trunk, this extra fluff will go away at 
least configuration-wise.

Eventually, when we can remove support for all of these deprecated vars, this 
will reduce code complexity and gives us (effectively) one code path to test. 
But bear in mind that we're years away before YARN\_OPTS and friends disappear. 
 It's been 5+ years since our last trunk release. I'll likely be dead by the 
time 4.x comes out and these useless YARN\_\* vars can get culled officially.  
But the work has to start now.

bq. The hive binary could unset YARN_OPTS / YARN_CLIENT_OPTS - and leave them 
intact for the session/shell from where the hive binary was invoked.

Which, again, if hive wants to stick to using hadoop jar, this would be my 
advice. \[2]  Just keep in mind this also means that any user settings that 
they might have wanted to apply to their YARN environment will not kick in. 
It's no different than what is happening today, but it may not reflect what 
users want.  Thus we're back to why the warnings went in.

\[1]  I'm not saying the community would do this, but let's use hive as an 
example here of how much more powerful things are in trunk.  With HADOOP-12930, 
it's now possible for hive to add a 'hadoop hive' command or a (even more 
outrageous!) 'hdfs hivefs' command.  Rather than integrate *outside* the 
framework, one could integrate *inside* and pull exactly the information 
required. This will hopefully put 'hadoop jar' and 'yarn jar' on life support. 
We'll keep them around but it's going to be much more desirable to just 
integrate directly.  The new 'mapred streaming' command is a great example 
here: why make users call hadoop jar with some weird version number when it's 
now trivial to dynamically add commands at build or configure time?

\[2] Very long term (post-3.x), it would probably be better if hive called 
hadoop-config.sh and/or hadoop-functions.sh directly.  This would bypass the 
middleman and give much better control.  I'd be very interested to hear what 
sort of holes we have in the functionality here that makes this 
hard/impossible. Off the top, I suspect we need to make one big function of the 
series of function calls in hadoop-config.sh, but would love to hear your 
insight on this.

> Add an option to suppress the 'use yarn jar' warning or remove it
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13335
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13335
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HADOOP-13335.01.patch, HADOOP-13335.02.patch, 
> HADOOP-13335.02_branch-2.patch, HADOOP-13335.03.patch, 
> HADOOP-13335.03_branch-2.patch, HADOOP-13335.04.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-11257 added a 'deprecation' 
> warning for 'hadoop jar'.
> hadoop jar is used for a lot more that starting jobs. As an example - hive 
> uses it to start all it's services (HiveServer2, the hive client, beeline 
> etc).
> Using 'yarn jar' for to start these services / tools doesn't make a lot of 
> sense - there's no relation to yarn other than requiring the classpath to 
> include yarn libraries.
> I'd propose reverting the changes where this message is printed if YARN 
> variables are set (leave it in the help message), or adding a mechanism which 
> would allow users to suppress this WARNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to