RE: Mapreduce Job fails if one Node is offline?

wget.null Fri, 21 Oct 2016 02:54:09 -0700

Hey Mike,

Dfs replication has nothing to do with Yarn or MapReduce, its HDFS. Replication 
defines how many replicas are existing in a cluster. 
When you kill the NM and you don’t have yarn.nodemanager.recovery.enabled 
(https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html)
 set, the containers running on that node are getting lost or killed, but your 
job will likely run and wait until that NM comes back.


http://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/
http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_ha_yarn_work_preserving_recovery.html

--alex

--
B: mapredit.blogspot.com

From: Mike Wenzel
Sent: Friday, October 21, 2016 11:29 AM
To: [email protected]
Subject: Mapreduce Job fails if one Node is offline?

I got a small cluster for testing and learning hadoop:

Node1 - Namenode + ResourceManager + JobhistoryServer
Node2 - SecondaryNamenode
Node3 - Datanode + NodeManager
Node4 - Datanode + NodeManager
Node5 - Datanode + NodeManager

My dfs.replication is set to 2.

When I kill the Datanode and Nodemanager process on Node5  I expect Hadoop 
still to run and finish my mapreduce jobs successfully.
In reality the job fails because he tries to transfer blocks to Node5 which is 
offline. Replication is set to 2, so I expect him to see that Node5 is offline 
and only take the other two Nodes to work with.

Can someone please explain to me how Hadoop should work in this case?
If my expectation of Hadoop is correct, and someone would try to help me out, I 
can add logs and configuration.

Best Regards,
Mike.

RE: Mapreduce Job fails if one Node is offline?

Reply via email to