Hello. I encounter a strange problem with my namenode. I have the following architecture : - Two namenodes in HA - 600 datanodes - HDP 3.1.4 - 150 millions of files and folders
Sometimes, when I query the namenode with the hdfs client, I got a timeout error like this : hdfs dfs -ls -d /user/myuser 22/02/14 15:07:44 INFO retry.RetryInvocationHandler: org.apache.hadoop.net.ConnectTimeoutException: Call From <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover attempts. Trying to failover after sleeping for 2694ms. I checked the heap of the namenode and there is no problem (I have 75 GB of max heap, I'm around 50 used GB). I checked the threads of the clientRPC for the namenode and I'm at 200 which respects the recommandations from hadoop operations book. I have serviceRPC enabled to prevent any problem which could be coming from datanodes or ZKFC. General resources seems OK, CPU usage is pretty fine, same for memory, network or IO. No firewall is enabled on my namenodes nor my client. I was wondering what could cause this problem, please ? Thank you in advance for your help ! Best regards. T@le
