Re: Client to namenode Socket timeout exception - connection-pending

gurmukh singh Thu, 24 Feb 2022 11:59:19 -0800

 @Tale Hive you provided the details in the first email, missed it.
Can you provide me the output of below from Namenode:


$ sysctl -n net.core.somaxconn

$ sysctl -n net.ipv4.tcp_max_syn_backlog

$ sysctl -n net.core.netdev_max_backlog


$ netstat -an | grep -c SYN_RECV

$ netstat -an | egrep -v "MYIP.(PORTS|IN|LISTEN)"  | wc -l


What do you see in the JN logs? and what about ZK logs?any logs in NN, ZK on 
the lines of "Slow sync'What is the ZK heap?



    On Friday, 25 February, 2022, 06:42:31 am GMT+11, gurmukh singh 
<[email protected]> wrote:  
 
  I checked the heap of the namenode and there is no problem (I have 75 GB of 
max heap, I'm around 50 used GB).    
    Why 75GB heap size for NN? are you running a very large cluster?    50 GB 
of heap used? Can you check are talking about the NN heap itself or are you 
saying about the total mem used on the server?    50GB approx means 200 million 
blocks? do you have that many.

I checked the threads of the clientRPC for the namenode and I'm at 200 which 
respects the recommandations from hadoop operations book.    The formula is 20 
X log base2(n); where n is the number of nodes.    So, if you have a thousand 
nodes we keep it to 200 (20 X log2(1024)=200) and then approx 20 threads per 
thousand nodes.
I have serviceRPC enabled to prevent any problem which could be coming from 
datanodes or ZKFC.

    On Thursday, 24 February, 2022, 12:19:51 am GMT+11, Tale Hive 
<[email protected]> wrote:  
 
 Hello.
According to what I saw this morning, I can see that I am in the first 
situation in fact :   
   - Client sends one packet with flag SYN to namenode
   - Namenode sends one packet with flags SYN, ACK to the client
   - Client sends n packets with flags PSH, ACK to the namenode, for each 
subfolder
   - Namenode sends n packets PSH, ACK to the client, for the content of each 
subfolder
So the number of (PSH, ACK) packets from the tcpdump pcap file is not what is 
filling the accept queue of port 8020 ClientRPC server on Namenode.
I'm going to focus on checking the packets with SYN flag which arrive to the 
namenode.
After that, because the jstack provokes active namenode failover, I don't have 
many more tracks to follow except increase again the listenQueue, to mitigate 
the problem, not to solve it.
Best regards.
T@le


Le mer. 23 févr. 2022 à 11:46, Tale Hive <[email protected]> a écrit :

Hello guys.
Still investigating the tcpdump. I don't see a lot of packets with the flag SYN 
when the listenQueue is full. 
What I see is a lot of packets with the flag "PSH, ACK" with data inside like 
this :getListing.org.apache.hadoop.hdfs.protocol.ClientProtocol 
/apps/hive/warehouse/<mydb>.db/<mytable>/<mypartition>
It makes me wonder, when a client perform an hdfs dfs -ls -R <HDFS_PATH>, how 
many SYN packets will it send to the namenode ? One in total or one by 
subfolder ?Let's say I have "n" subfolders inside <HDFS_PATH>. Will we have 
this situation :- Client sends one SYN packet to Namenode- Namenode sends one 
SYN-ACK packets to client- Client sends n ACK or (PSH, ACK) packets to Namenode
Or this situation :- Client sends n SYN packet to Namenode- Namenode sends n 
SYN-ACK packets to client- Client sends n ACK or (PSH, ACK) 

It would mean an hdfs recursive listing on a path with a lot of subfolders 
could harm the other clients by sending too many packets to the namenode ?

About the jstack, I tried it on the namenode JVM but it provoked a failover, as 
the namenode was not answering at all (in particular, no answer to ZKFC), and 
the jstack never ended, I had to kill it.I don't know if a kill -3 or a jstack 
-F could help, but at least jstack -F contains less valuable information.

T@le

Le mar. 22 févr. 2022 à 10:29, Amith sha <[email protected]> a écrit :

If TCP error occurs then you need to check the network metrics. Yes, TCP DUMP 
can help you. 


Thanks & Regards
Amithsha

On Tue, Feb 22, 2022 at 1:29 PM Tale Hive <[email protected]> wrote:

Hello !

@Amith sha 
I checked also the system metrics, nothing wrong in CPU, RAM or IO.The only 
thing I found was these TCP errors (ListenDrop).
@HKI'm monitoring a lot of JVM metrics like this one : "UnderReplicatedBlocks" 
in the bean "Hadoop:service=NameNode,name=FSNamesystem".And I found no under 
replicated blocks when the problem of timeout occurs, unfortunately.Thanks for 
you advice, in addition to the tcpdump, I'll perform some jstacks to see if I 
can find what ipc handlers are doing.
Best regards.
T@le






Le mar. 22 févr. 2022 à 04:30, HK <[email protected]> a écrit :

Hi Tape,Could you please thread dump of namenode process. Could you please 
check what ipc handlers are doing. 
We faced similar issue when the under replication is high in the cluster due to 
filesystem wirteLock. 
On Tue, 22 Feb 2022, 8:37 am Amith sha, <[email protected]> wrote:

Check your system metrics too. 
On Mon, Feb 21, 2022, 10:52 PM Tale Hive <[email protected]> wrote:

Yeah, next step is for me to perform a tcpdump just when the problem occurs.I 
want to know if my namenode does not accept connections because it freezes for 
some reasons or because there is too many connections at a time.
My delay if far worse than 2s, sometimes, an hdfs dfs -ls -d /user/<my-user> 
takes 20s, 43s and rarely it is even bigger than 1 minut.And during this time, 
CallQueue is OK, Heap is OK, I don't find any metrics which could show me a 
problem inside the namenode JVM.

Best regards.
T@le

Le lun. 21 févr. 2022 à 16:32, Amith sha <[email protected]> a écrit :

If you still concerned about the delay of > 2 s then you need to do benchmark 
with and without load. To find the root cause of the problem it will help.
On Mon, Feb 21, 2022, 1:52 PM Tale Hive <[email protected]> wrote:

Hello Amith.
Hm, not a bad idea. If I increase the size of the listenQueue and if I increase 
timeout, the combination of both may mitigate more the problem than just 
increasing listenQueue size.It won't solve the problem of acceptance speed, but 
it could help.
Thanks for the suggestion !
T@le
Le lun. 21 févr. 2022 à 02:33, Amith sha <[email protected]> a écrit :

org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while 
waiting for channel to be ready for connect. 
Connection timed out after 20000 milli sec i suspect this value is very low for 
a namenode with 75Gb of heap usage. Can you increase the value to 5sec and 
check the connection. To increase the value modify this property
ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the 
core-site.xml)


Thanks & Regards
Amithsha

On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote:

Hello Tom.
Sorry for my absence of answers, I don't know why gmail puts your mail into 
spam -_-.
To answer you :   
   - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC metric 
are all OK
   - Threads are plenty sufficient (I can see the metrics also for them and I  
am below 200, the number I have for 8020 RPC server)
Did you see my other answers about this problem ?I would be interested to have 
your opinion about that !
Best regards.
T@le


Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit :

It might be helpful to analyze namenode metrics and logs.

What about some key metrics? Examples are callQueueLength, avgQueueTime, 
avgProcessingTime and GC metrics.

In addition, is the number of threads(dfs.namenode.service.handler.count) in 
the namenode sufficient? 

Hopefully this will help.

Best regards.
Tom

Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道：

Hello.
I encounter a strange problem with my namenode. I have the following 
architecture :- Two namenodes in HA 
- 600 datanodes- HDP 3.1.4- 150 millions of files and folders

Sometimes, when I query the namenode with the hdfs client, I got a timeout 
error like this :
hdfs dfs -ls -d /user/myuser
22/02/14 15:07:44 INFO retry.RetryInvocationHandler: 
org.apache.hadoop.net.ConnectTimeoutException: Call From 
<my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 failed 
on socket timeout exception: 
  org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; 
  For more details see:  http://wiki.apache.org/hadoop/SocketTimeout, 
while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over 
<active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover attempts. 
Trying to failover after sleeping for 2694ms. 

I checked the heap of the namenode and there is no problem (I have 75 GB of max 
heap, I'm around 50 used GB).I checked the threads of the clientRPC for the 
namenode and I'm at 200 which respects the recommandations from hadoop 
operations book.I have serviceRPC enabled to prevent any problem which could be 
coming from datanodes or ZKFC.General resources seems OK, CPU usage is pretty 
fine, same for memory, network or IO.No firewall is enabled on my namenodes nor 
my client.

I was wondering what could cause this problem, please ?
Thank you in advance for your help !

Best regards.
T@le

Re: Client to namenode Socket timeout exception - connection-pending

Reply via email to