Hello Amith.

Hm, not a bad idea. If I increase the size of the listenQueue and if I
increase timeout, the combination of both may mitigate more the problem
than just increasing listenQueue size.
It won't solve the problem of acceptance speed, but it could help.

Thanks for the suggestion !

T@le

Le lun. 21 févr. 2022 à 02:33, Amith sha <[email protected]> a écrit :

> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
> waiting for channel to be ready for connect.
> Connection timed out after 20000 milli sec i suspect this value is very
> low for a namenode with 75Gb of heap usage. Can you increase the value to
> 5sec and check the connection. To increase the value modify this property
> ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the
> core-site.xml)
>
>
> Thanks & Regards
> Amithsha
>
>
> On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote:
>
>> Hello Tom.
>>
>> Sorry for my absence of answers, I don't know why gmail puts your mail
>> into spam -_-.
>>
>> To answer you :
>>
>>    - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC
>>    metric are all OK
>>    - Threads are plenty sufficient (I can see the metrics also for them
>>    and I  am below 200, the number I have for 8020 RPC server)
>>
>> Did you see my other answers about this problem ?
>> I would be interested to have your opinion about that !
>>
>> Best regards.
>>
>> T@le
>>
>>
>> Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit :
>>
>>> It might be helpful to analyze namenode metrics and logs.
>>>
>>> What about some key metrics? Examples are callQueueLength, avgQueueTime,
>>> avgProcessingTime and GC metrics.
>>>
>>> In addition, is the number of
>>> threads(dfs.namenode.service.handler.count) in the namenode sufficient?
>>>
>>> Hopefully this will help.
>>>
>>> Best regards.
>>> Tom
>>>
>>> Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道:
>>>
>>>> Hello.
>>>>
>>>> I encounter a strange problem with my namenode. I have the following
>>>> architecture :
>>>> - Two namenodes in HA
>>>> - 600 datanodes
>>>> - HDP 3.1.4
>>>> - 150 millions of files and folders
>>>>
>>>> Sometimes, when I query the namenode with the hdfs client, I got a
>>>> timeout error like this :
>>>> hdfs dfs -ls -d /user/myuser
>>>>
>>>> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler:
>>>> org.apache.hadoop.net.ConnectTimeoutException: Call From
>>>> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020
>>>> failed on socket timeout exception:
>>>>   org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout
>>>> while waiting for channel to be ready for connect. ch :
>>>> java.nio.channels.SocketChannel[connection-pending
>>>> remote=<active-namenode-hostname>/<active-namenode-ip>:8020];
>>>>   For more details see:  http://wiki.apache.org/hadoop/SocketTimeout,
>>>> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over
>>>> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover
>>>> attempts. Trying to failover after sleeping for 2694ms.
>>>>
>>>> I checked the heap of the namenode and there is no problem (I have 75
>>>> GB of max heap, I'm around 50 used GB).
>>>> I checked the threads of the clientRPC for the namenode and I'm at 200
>>>> which respects the recommandations from hadoop operations book.
>>>> I have serviceRPC enabled to prevent any problem which could be coming
>>>> from datanodes or ZKFC.
>>>> General resources seems OK, CPU usage is pretty fine, same for memory,
>>>> network or IO.
>>>> No firewall is enabled on my namenodes nor my client.
>>>>
>>>> I was wondering what could cause this problem, please ?
>>>>
>>>> Thank you in advance for your help !
>>>>
>>>> Best regards.
>>>>
>>>> T@le
>>>>
>>>

Reply via email to