Re: 3.1.0 MR work won't distribute after dual-homing NameNode

Gour Saha Wed, 13 Jun 2018 08:03:01 -0700

Looks like the YARN/MR multihoming doc patch never got committed and hence not 
available in the site documentation. You can look into the doc patch in 
https://issues.apache.org/jira/browse/YARN-2384 (may be use an online markdown 
tool to view it better) and see if you followed the configuration mentioned 
there. Another comprehensive multihoming document which might help you is 
here<https://hortonworks.com/blog/multihoming-on-hadoop-yarn-clusters/>.

-Gour

From: Jeff Hubbs <[email protected]>
Date: Tuesday, June 5, 2018 at 2:57 PM
To: "[email protected]" <[email protected]>
Subject: 3.1.0 MR work won't distribute after dual-homing NameNode

Hi -

I have a three node Hadoop 3.1.0 cluster on which the daemons are distributed 
like so:

Daemons on msba02a...
20112 NameNode
20240 DataNode
24101 JobHistoryServer
20918 WebAppProxyServer
20743 NodeManager
20476 SecondaryNameNode

Daemons on msba02b...
22547 DataNode
22734 ResourceManager
23007 NodeManager

Daemons on msba02c...
10005 NodeManager
9818 DataNode

All three nodes run Gentoo Linux and have either one or two volumes devoted to 
HDFS; HDFS reports a size of 5.7TiB.

Previously, HDFS and MapReduce (testing with the archetypical "wordcount" job 
on a 5.8GiB XML file) worked fine in an environment where all three machines 
are on the same office LAN and get their IP addresses from DHCP; dynamic DNS 
creates network host names based on the machines' host names as reported by the 
machines' DHCP clients. FQDNs were used for all intra- and inter-machine 
references in the Hadoop configuration files.

Since then, I've changed things so that msba02a now has a second NIC that 
connects to an independent LAN along with the other two machines using their 
built-in NICs like before; msba02b and msba02c reach the Internet by going 
through NAT on msba02a. /etc/hosts on all three machines has been populated 
with the static IPs I gave them like so:

127.0.0.1  localhost
1.0.0.1 msba02a
1.0.0.10 msba02b
1.0.0.20 msba02c

So now if I shell into msba02a and run the wordcount job with the test XML file 
sitting in HDFS with replication set to 3, the job *does* run and gives me the 
expected output file...but the workload doesn't distribute to all cores on all 
nodes like before; it all executes on msba02a. In fact, it doesn't even run on 
all cores on msba02a; it seems to light up just one core at any given moment. 
The job used to run on the cluster in 1m48s; now it takes 5m56 (a ratio I can't 
understand; these are all four-core, eight-thread machines so I'd expect a 
ratio of close to 24:1, not 3:1). The only time the other two nodes light up at 
all is near the end of the job when the output file (770MiB) is written out to 
HDFS.

I've gone through 
https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
 and set the values shown there to 1.0.0.1 in hdfs-site.xml on msba02a in hopes 
of getting the daemons to bind to the cluster-facing NIC instead of the 
outward-facing NIC, but it seems to me like HDFS is working exactly like it's 
supposed to. Note that the ResourceManager daemon runs on msba02b and therefore 
doesn't need to be bound to a particular NIC; it still uses that machine's only 
NIC like before except now its IP address is static and is resolved via its 
local /etc/hosts.

The only errors showing up in the daemon logs of any nodes seem to be e.g. 
"org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 ExpiredTokenRemover received java.lang.InterruptedException: sleep 
interrupted" in hadoop-yarn-resourcemanager-msba02b.log and 
hadoop-mapred-historyserver-msba02a.log.

As for the hadoop run output, previously when everything was working things 
would get to point where it would print out a series of lines like

map 0% reduce 0%

and that line would repeat with "map" percentage climbing first and then the 
"reduce" percentage would climb until both numbers reached 100% and the job 
would wrap up soon afterward. Now, it intersperses those lines with other 
output and it skips around, like this:

2018-06-05 17:45:34,338 INFO mapreduce.Job:  map 100% reduce 0%
2018-06-05 17:45:36,295 INFO mapred.MapTask: Finished spill 0
2018-06-05 17:45:36,295 INFO mapred.MapTask: (RESET) equator 61480136 kv 
15370028(61480112) kvi 13480948(53923792)
2018-06-05 17:45:36,882 INFO mapred.MapTask: Spilling map output
2018-06-05 17:45:36,882 INFO mapred.MapTask: bufstart = 61480136; bufend = 
10372007; bufvoid = 104857566
2018-06-05 17:45:36,882 INFO mapred.MapTask: kvstart = 15370028(61480112); 
kvend = 7835876(31343504); length = 7534153/6553600
2018-06-05 17:45:36,882 INFO mapred.MapTask: (EQUATOR) 17997991 kvi 
4499492(17997968)
2018-06-05 17:45:38,774 INFO mapred.MapTask: Finished spill 1
2018-06-05 17:45:38,774 INFO mapred.MapTask: (RESET) equator 17997991 kv 
4499492(17997968) kvi 2642780(10571120)
2018-06-05 17:45:38,910 INFO mapred.LocalJobRunner:
2018-06-05 17:45:38,910 INFO mapred.MapTask: Starting flush of map output
2018-06-05 17:45:38,910 INFO mapred.MapTask: Spilling map output
2018-06-05 17:45:38,911 INFO mapred.MapTask: bufstart = 17997991; bufend = 
40956853; bufvoid = 104857600
2018-06-05 17:45:38,911 INFO mapred.MapTask: kvstart = 4499492(17997968); kvend 
= 1327036(5308144); length = 3172457/6553600
2018-06-05 17:45:39,340 INFO mapreduce.Job:  map 4% reduce 0%
2018-06-05 17:45:39,684 INFO mapred.MapTask: Finished spill 2
2018-06-05 17:45:39,788 INFO mapred.Merger: Merging 3 sorted segments
2018-06-05 17:45:39,788 INFO mapred.Merger: Down to the last merge-pass, with 3 
segments left of total size: 34645401 bytes
2018-06-05 17:45:40,251 INFO mapred.Task: 
Task:attempt_local1155504279_0001_m_000002_0 is done. And is in the process of 
committing
2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: map > sort
2018-06-05 17:45:40,253 INFO mapred.Task: Task 
'attempt_local1155504279_0001_m_000002_0' done.
2018-06-05 17:45:40,253 INFO mapred.Task: Final Counters for 
attempt_local1155504279_0001_m_000002_0: Counters: 23
    File System Counters
        FILE: Number of bytes read=106419805
        FILE: Number of bytes written=202253153
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=410006948
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=1
    Map-Reduce Framework
        Map input records=2653033
        Map output records=4553651
        Map output bytes=130562451
        Map output materialized bytes=31060160
        Input split bytes=95
        Combine input records=5425504
        Combine output records=1618222
        Spilled Records=1618222
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=114
        Total committed heap usage (bytes)=1301807104
    File Input Format Counters
        Bytes Read=134348800
2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: Finishing task: 
attempt_local1155504279_0001_m_000002_0
2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: Starting task: 
attempt_local1155504279_0001_m_000003_0
2018-06-05 17:45:40,254 INFO output.FileOutputCommitter: File Output Committer 
Algorithm version is 2
2018-06-05 17:45:40,254 INFO output.FileOutputCommitter: FileOutputCommitter 
skip cleanup _temporary folders under output directory:false, ignore cleanup 
failures: false
2018-06-05 17:45:40,254 INFO mapred.Task:  Using ResourceCalculatorProcessTree 
: [ ]
2018-06-05 17:45:40,255 INFO mapred.MapTask: Processing split: 
hdfs://msba02a:9000/allcat.xml:268435456+134217728
2018-06-05 17:45:40,265 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-06-05 17:45:40,266 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-06-05 17:45:40,266 INFO mapred.MapTask: soft limit at 83886080
2018-06-05 17:45:40,266 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-06-05 17:45:40,266 INFO mapred.MapTask: kvstart = 26214396; length = 
6553600
2018-06-05 17:45:40,266 INFO mapred.MapTask: Map output collector class = 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2018-06-05 17:45:40,341 INFO mapreduce.Job:  map 100% reduce 0%
2018-06-05 17:45:41,079 INFO mapred.MapTask: Spilling map output
2018-06-05 17:45:41,079 INFO mapred.MapTask: bufstart = 0; bufend = 53799451; 
bufvoid = 104857600
2018-06-05 17:45:41,079 INFO mapred.MapTask: kvstart = 26214396(104857584); 
kvend = 18692744(74770976); length = 7521653/6553600
2018-06-05 17:45:41,079 INFO mapred.MapTask: (EQUATOR) 61425451 kvi 
15356356(61425424)
2018-06-05 17:45:43,110 INFO mapred.MapTask: Finished spill 0
2018-06-05 17:45:43,110 INFO mapred.MapTask: (RESET) equator 61425451 kv 
15356356(61425424) kvi 13514352(54057408)
2018-06-05 17:45:43,687 INFO mapred.MapTask: Spilling map output
2018-06-05 17:45:43,687 INFO mapred.MapTask: bufstart = 61425451; bufend = 
10294846; bufvoid = 104857586
2018-06-05 17:45:43,687 INFO mapred.MapTask: kvstart = 15356356(61425424); 
kvend = 7816592(31266368); length = 7539765/6553600
2018-06-05 17:45:43,687 INFO mapred.MapTask: (EQUATOR) 17920846 kvi 
4480204(17920816)
2018-06-05 17:45:46,275 INFO mapred.MapTask: Finished spill 1
2018-06-05 17:45:46,275 INFO mapred.MapTask: (RESET) equator 17920846 kv 
4480204(17920816) kvi 2573716(10294864)
2018-06-05 17:45:46,423 INFO mapred.LocalJobRunner:
2018-06-05 17:45:46,423 INFO mapred.MapTask: Starting flush of map output
2018-06-05 17:45:46,423 INFO mapred.MapTask: Spilling map output
2018-06-05 17:45:46,423 INFO mapred.MapTask: bufstart = 17920846; bufend = 
41420321; bufvoid = 104857600
2018-06-05 17:45:46,423 INFO mapred.MapTask: kvstart = 4480204(17920816); kvend 
= 1126824(4507296); length = 3353381/6553600

Any hints as to why work isn't distributing? It seems to me like this kind of 
network configuration for Hadoop clusters would be more the norm than one where 
all nodes are on a network with everything else in an environment (in our 
situation one driver for having cluster traffic isolated is because the data 
files used may contain NDA-bound data that shouldn't travel the office LAN 
unencrypted).

Thanks!

Re: 3.1.0 MR work won't distribute after dual-homing NameNode

Reply via email to