hi,all
when i run some jobs on hadoop, some datanodes will die,then job will
fail finally.But datanode process is alive,when the cluster clams down,the
dead datanode will come back.
when datanode is down, i see some error logs like this:
2/01/14 14:08:41 INFO mapred.JobClient: Task Id :
attempt_201201082210_0051_m_000313_0, Status : FAILED
java.io.IOException: pipe child exception
at
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:225)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.io.IOException: Could not obtain block:
blk_3541449604139837149_1405226 file=/testdata/part-00313
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1993)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1800)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948)
at java.io.DataInputStream.read(DataInputStream.java:83)
at
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:88)
... 7 more
the message " Could not obtain block: blk_* ..." reminds me the
"dfs.datanode.max.xcievers",but I have set it to 4096 already.
how to resolve this problem?