Shuffle is configured and I could run MR Job on this 5-nodes cluster before
I move Resourcemanger node  from dn4 to dn5.


After analysis logs under $HADOOP_LOG_DIR and review terminal output, I
find the most possible reason that caused map task hang out at 0% is that:
NodeManager won't run correctly because connection is refused caused by
google protocol buffer, so slave in cluster could not communicate with
master(Resourcemanager), job will not run.


By the way, I compile and make/make install protocol buf at the same time
on 5 nodes using parallel ssh tool. May be dn5 environment have something
wrong.

Thanks a lot !



most important part of nodemanager node output is here:
2011-12-21 21:23:21,142 ERROR service.CompositeService
(CompositeService.java:start(72)) - Error starting services
org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException:
java.lang.reflect.UndeclaredThrowableException
    at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
    at
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
    at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
    at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
    at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
    at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
    at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
    ... 3 more
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException:
Call From dn3/192.168.3.227 to dn4:50030 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
    at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
    at $Proxy14.registerNodeManager(Unknown Source)
    at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
    ... 5 more
Caused by: java.net.ConnectException: Call From dn3/192.168.3.227 to
dn4:50030 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:617)
    at org.apache.hadoop.ipc.Client.call(Client.java:1089)
    at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
    ... 7 more
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
    at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:419)
    at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:460)
    at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:557)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
    at org.apache.hadoop.ipc.Client.call(Client.java:1065)
    ... 8 more
2011-12-21 21:23:21,143 INFO  event.AsyncDispatcher
(AsyncDispatcher.java:run(71)) - AsyncDispatcher thread interrupted
java.lang.InterruptedException
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
    at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
    at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:69)
    at java.lang.Thread.run(Thread.java:636)
2011-12-21 21:23:21,144 INFO  service.AbstractService
(AbstractService.java:stop(75)) - Service:Dispatcher is stopped.

Reply via email to