Re: hdfs namenode fails over frequently due to timeout with zkfc

HK Wed, 18 Sep 2019 21:20:00 -0700

Are you checking ZKFC process logs and jstack?
At what stage ZKFC timing out? zk session timing  out? or namenode health
monitoring timing out?



On Thu, Sep 19, 2019 at 9:17 AM Wenqi Ma <[email protected]> wrote:

> HDFS version is 2.7.7
>
> We have 500+ nodes, 230 million files and directories, 270 million blocks,
> 128GB memory for namenode. Recently namenode became unstable, and failed
> over 5-10 times everyday.
>
> According to the jstack, I cannot find any stuck thread. It seems that the
> namenode just cannot handle the requests in time because RUNNABLE threads
> are changed every time I print the jstack. It is like:
> "IPC Server handler 74 on 8020" daemon prio=10 tid=0x00007f5cf4f31000
> nid=0x44c5 runnable [0x00007f3ab2fed000]
>    java.lang.Thread.State: RUNNABLE
>
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor$BlockIterator.next(DatanodeDescriptor.java:542)
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getBlocksWithLocations(BlockManager.java:1069)
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getBlocks(BlockManager.java:1044)
>
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlocks(NameNodeRpcServer.java:481)
>     at
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.getBlocks(NamenodeProtocolServerSideTranslatorPB.java:86)
>     at
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12017)
>
> We have 200 rpc hanlders and do not use service-rpc. Is it helpful to
> enable the service-rpc? or any other suggestions?
> Do let me know if you need other information.
> Many thanks.
> --
> Best Regards!
> Wenqi
>
>

Re: hdfs namenode fails over frequently due to timeout with zkfc

Reply via email to