If you still concerned about the delay of > 2 s then you need to do benchmark with and without load. To find the root cause of the problem it will help.
On Mon, Feb 21, 2022, 1:52 PM Tale Hive <[email protected]> wrote: > Hello Amith. > > Hm, not a bad idea. If I increase the size of the listenQueue and if I > increase timeout, the combination of both may mitigate more the problem > than just increasing listenQueue size. > It won't solve the problem of acceptance speed, but it could help. > > Thanks for the suggestion ! > > T@le > > Le lun. 21 févr. 2022 à 02:33, Amith sha <[email protected]> a écrit : > >> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while >> waiting for channel to be ready for connect. >> Connection timed out after 20000 milli sec i suspect this value is very >> low for a namenode with 75Gb of heap usage. Can you increase the value to >> 5sec and check the connection. To increase the value modify this property >> ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to >> the core-site.xml) >> >> >> Thanks & Regards >> Amithsha >> >> >> On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote: >> >>> Hello Tom. >>> >>> Sorry for my absence of answers, I don't know why gmail puts your mail >>> into spam -_-. >>> >>> To answer you : >>> >>> - The metrics callQueueLength, avgQueueTime, avgProcessingTime and >>> GC metric are all OK >>> - Threads are plenty sufficient (I can see the metrics also for them >>> and I am below 200, the number I have for 8020 RPC server) >>> >>> Did you see my other answers about this problem ? >>> I would be interested to have your opinion about that ! >>> >>> Best regards. >>> >>> T@le >>> >>> >>> Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit : >>> >>>> It might be helpful to analyze namenode metrics and logs. >>>> >>>> What about some key metrics? Examples are callQueueLength, >>>> avgQueueTime, avgProcessingTime and GC metrics. >>>> >>>> In addition, is the number of >>>> threads(dfs.namenode.service.handler.count) in the namenode sufficient? >>>> >>>> Hopefully this will help. >>>> >>>> Best regards. >>>> Tom >>>> >>>> Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道: >>>> >>>>> Hello. >>>>> >>>>> I encounter a strange problem with my namenode. I have the following >>>>> architecture : >>>>> - Two namenodes in HA >>>>> - 600 datanodes >>>>> - HDP 3.1.4 >>>>> - 150 millions of files and folders >>>>> >>>>> Sometimes, when I query the namenode with the hdfs client, I got a >>>>> timeout error like this : >>>>> hdfs dfs -ls -d /user/myuser >>>>> >>>>> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler: >>>>> org.apache.hadoop.net.ConnectTimeoutException: Call From >>>>> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 >>>>> failed on socket timeout exception: >>>>> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout >>>>> while waiting for channel to be ready for connect. ch : >>>>> java.nio.channels.SocketChannel[connection-pending >>>>> remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; >>>>> For more details see: http://wiki.apache.org/hadoop/SocketTimeout, >>>>> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over >>>>> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover >>>>> attempts. Trying to failover after sleeping for 2694ms. >>>>> >>>>> I checked the heap of the namenode and there is no problem (I have 75 >>>>> GB of max heap, I'm around 50 used GB). >>>>> I checked the threads of the clientRPC for the namenode and I'm at 200 >>>>> which respects the recommandations from hadoop operations book. >>>>> I have serviceRPC enabled to prevent any problem which could be coming >>>>> from datanodes or ZKFC. >>>>> General resources seems OK, CPU usage is pretty fine, same for memory, >>>>> network or IO. >>>>> No firewall is enabled on my namenodes nor my client. >>>>> >>>>> I was wondering what could cause this problem, please ? >>>>> >>>>> Thank you in advance for your help ! >>>>> >>>>> Best regards. >>>>> >>>>> T@le >>>>> >>>>
