org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
waiting for channel to be ready for connect.
Connection timed out after 20000 milli sec i suspect this value is very low
for a namenode with 75Gb of heap usage. Can you increase the value to 5sec
and check the connection. To increase the value modify this property
ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the
core-site.xml)


Thanks & Regards
Amithsha


On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote:

> Hello Tom.
>
> Sorry for my absence of answers, I don't know why gmail puts your mail
> into spam -_-.
>
> To answer you :
>
>    - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC
>    metric are all OK
>    - Threads are plenty sufficient (I can see the metrics also for them
>    and I  am below 200, the number I have for 8020 RPC server)
>
> Did you see my other answers about this problem ?
> I would be interested to have your opinion about that !
>
> Best regards.
>
> T@le
>
>
> Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit :
>
>> It might be helpful to analyze namenode metrics and logs.
>>
>> What about some key metrics? Examples are callQueueLength, avgQueueTime,
>> avgProcessingTime and GC metrics.
>>
>> In addition, is the number of threads(dfs.namenode.service.handler.count)
>> in the namenode sufficient?
>>
>> Hopefully this will help.
>>
>> Best regards.
>> Tom
>>
>> Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道:
>>
>>> Hello.
>>>
>>> I encounter a strange problem with my namenode. I have the following
>>> architecture :
>>> - Two namenodes in HA
>>> - 600 datanodes
>>> - HDP 3.1.4
>>> - 150 millions of files and folders
>>>
>>> Sometimes, when I query the namenode with the hdfs client, I got a
>>> timeout error like this :
>>> hdfs dfs -ls -d /user/myuser
>>>
>>> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler:
>>> org.apache.hadoop.net.ConnectTimeoutException: Call From
>>> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020
>>> failed on socket timeout exception:
>>>   org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout
>>> while waiting for channel to be ready for connect. ch :
>>> java.nio.channels.SocketChannel[connection-pending
>>> remote=<active-namenode-hostname>/<active-namenode-ip>:8020];
>>>   For more details see:  http://wiki.apache.org/hadoop/SocketTimeout,
>>> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over
>>> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover
>>> attempts. Trying to failover after sleeping for 2694ms.
>>>
>>> I checked the heap of the namenode and there is no problem (I have 75 GB
>>> of max heap, I'm around 50 used GB).
>>> I checked the threads of the clientRPC for the namenode and I'm at 200
>>> which respects the recommandations from hadoop operations book.
>>> I have serviceRPC enabled to prevent any problem which could be coming
>>> from datanodes or ZKFC.
>>> General resources seems OK, CPU usage is pretty fine, same for memory,
>>> network or IO.
>>> No firewall is enabled on my namenodes nor my client.
>>>
>>> I was wondering what could cause this problem, please ?
>>>
>>> Thank you in advance for your help !
>>>
>>> Best regards.
>>>
>>> T@le
>>>
>>

Reply via email to