________________________________
发件人: 白 瑶瑶 代表 白 瑶瑶 <[email protected]>
发送时间: 2018年10月18日 10:33
主题: namdenode question consultation and advice


Hi :

   My production Hadoop cluster (HA) has recently had a problem with two 
namenode hanging up frequently, causing errors that I couldn't resolve,The same 
is true of the namenode in the active state when the following error occurs 
after the crash, and the namenode in the standby state cannot be switched. The 
error is as follows:




2018-10-18 15:51:36,311 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
10.117.29.24
2018-10-18 15:51:36,311 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Rolling edit logs
2018-10-18 15:51:36,311 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Ending log segment 3420935
2018-10-18 15:51:38,738 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Number of transactions: 19 Total time for transactions(ms): 2 Number of 
transactions batched in Syncs: 0 Number of syncs: 10 SyncTimes(ms): 180 2525
2018-10-18 15:51:38,765 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
file /data/hadoop/tmp/dfs/name/current/edits_inprogress_0000000000003420935 -> 
/data/hadoop/tmp/dfs/name/current/edits_0000000000003420935-0000000000003420953
2018-10-18 15:51:38,765 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Starting log segment at 3420954
2018-10-18 15:51:44,767 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:45,768 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:46,769 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:47,770 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9004 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:48,771 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10005 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:49,771 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11006 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:50,773 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12007 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:51,774 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13008 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:52,774 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14009 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:53,776 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15010 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:54,777 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16011 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:55,778 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:56,780 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:57,781 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms 
(timeout=20000 ms) for a response for startLogSegment(3420954). Succeeded so 
far: [10.117.29.25:8485]
2018-10-18 15:51:58,767 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: starting log segment 3420954 failed for required journal 
(JournalAndStream(mgr=QJM to [10.117.29.25:8485, 10.117.29.24:8485, 
10.117.29.23:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at 
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1237)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1206)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1300)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5836)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1122)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2018-10-18 15:51:58,768 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2018-10-18 15:51:58,773 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at kvmserver25/10.117.29.25
************************************************************/
2018-10-18 16:04:13,143 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = kvmserver25/10.117.29.25


 I want to ask, under what circumstances will this mistake occur, or what good 
suggestions do you have?

   thank you.



 BAI

Reply via email to