org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. Connection timed out after 20000 milli sec i suspect this value is very low for a namenode with 75Gb of heap usage. Can you increase the value to 5sec and check the connection. To increase the value modify this property ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the core-site.xml)
Thanks & Regards Amithsha On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote: > Hello Tom. > > Sorry for my absence of answers, I don't know why gmail puts your mail > into spam -_-. > > To answer you : > > - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC > metric are all OK > - Threads are plenty sufficient (I can see the metrics also for them > and I am below 200, the number I have for 8020 RPC server) > > Did you see my other answers about this problem ? > I would be interested to have your opinion about that ! > > Best regards. > > T@le > > > Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit : > >> It might be helpful to analyze namenode metrics and logs. >> >> What about some key metrics? Examples are callQueueLength, avgQueueTime, >> avgProcessingTime and GC metrics. >> >> In addition, is the number of threads(dfs.namenode.service.handler.count) >> in the namenode sufficient? >> >> Hopefully this will help. >> >> Best regards. >> Tom >> >> Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道: >> >>> Hello. >>> >>> I encounter a strange problem with my namenode. I have the following >>> architecture : >>> - Two namenodes in HA >>> - 600 datanodes >>> - HDP 3.1.4 >>> - 150 millions of files and folders >>> >>> Sometimes, when I query the namenode with the hdfs client, I got a >>> timeout error like this : >>> hdfs dfs -ls -d /user/myuser >>> >>> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler: >>> org.apache.hadoop.net.ConnectTimeoutException: Call From >>> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 >>> failed on socket timeout exception: >>> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout >>> while waiting for channel to be ready for connect. ch : >>> java.nio.channels.SocketChannel[connection-pending >>> remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; >>> For more details see: http://wiki.apache.org/hadoop/SocketTimeout, >>> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over >>> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover >>> attempts. Trying to failover after sleeping for 2694ms. >>> >>> I checked the heap of the namenode and there is no problem (I have 75 GB >>> of max heap, I'm around 50 used GB). >>> I checked the threads of the clientRPC for the namenode and I'm at 200 >>> which respects the recommandations from hadoop operations book. >>> I have serviceRPC enabled to prevent any problem which could be coming >>> from datanodes or ZKFC. >>> General resources seems OK, CPU usage is pretty fine, same for memory, >>> network or IO. >>> No firewall is enabled on my namenodes nor my client. >>> >>> I was wondering what could cause this problem, please ? >>> >>> Thank you in advance for your help ! >>> >>> Best regards. >>> >>> T@le >>> >>
