Hello.

I encounter a strange problem with my namenode. I have the following
architecture :
- Two namenodes in HA
- 600 datanodes
- HDP 3.1.4
- 150 millions of files and folders

Sometimes, when I query the namenode with the hdfs client, I got a timeout
error like this :
hdfs dfs -ls -d /user/myuser

22/02/14 15:07:44 INFO retry.RetryInvocationHandler:
org.apache.hadoop.net.ConnectTimeoutException: Call From
<my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020
failed on socket timeout exception:
  org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=<active-namenode-hostname>/<active-namenode-ip>:8020];
  For more details see:  http://wiki.apache.org/hadoop/SocketTimeout,
while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over
<active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover
attempts. Trying to failover after sleeping for 2694ms.

I checked the heap of the namenode and there is no problem (I have 75 GB of
max heap, I'm around 50 used GB).
I checked the threads of the clientRPC for the namenode and I'm at 200
which respects the recommandations from hadoop operations book.
I have serviceRPC enabled to prevent any problem which could be coming from
datanodes or ZKFC.
General resources seems OK, CPU usage is pretty fine, same for memory,
network or IO.
No firewall is enabled on my namenodes nor my client.

I was wondering what could cause this problem, please ?

Thank you in advance for your help !

Best regards.

T@le

Reply via email to