Hello Gurmuck Singh.

Thank you for your answers.

Why 75GB heap size for NN? are you running a very large cluster?
> 50 GB of heap used? Can you check are talking about the NN heap itself or
> are you saying about the total mem used on the server?
> 50GB approx means 200 million blocks? do you have that many.
>

I have ~150 millions of blocks/files and I set up this heap following the
recommandations here :
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html

The formula is 20 X log base2(n); where n is the number of nodes.
> So, if you have a thousand nodes we keep it to 200 (20 X log2(1024)=200)
> and then approx 20 threads per thousand nodes.
>

I have 600 datanodes, which makes me normally at 20 * log2'(600) = 185
threads for the ClientRPC server (the one which listens on port 8020)

$ sysctl -n net.core.somaxconn
>
> $ sysctl -n net.ipv4.tcp_max_syn_backlog
>
> $ sysctl -n net.core.netdev_max_backlog
>

net.core.somaxconn= 8432
net.ipv4.tcp_max_syn_backlog = 4096
net.core.netdev_max_backlog = 2000


$ netstat -an | grep -c SYN_RECV
>
$ netstat -an | egrep -v "MYIP.(PORTS|IN|LISTEN)"  | wc -l
>

I'll check again and get you more information.

What do you see in the JN logs? and what about ZK logs?
> any logs in NN, ZK on the lines of "Slow sync'
>
Didn't check these logs, going to check them and get back to you.

> What is the ZK heap?
>
Zookeeper heap is 4 GB.

Disk latency
> Heap
> maxClientCnxns=800 (At least) As you have 600 nodes, so you are expecting
> a high job workload)
> jute.maxbuffer=1GB (By default it is very low, especially in a kerberozied
> env, it must be bumped up). This setting is not there in HDP by default,
> you have to put under custom-zoo.cfg
>

I'm going to check this also.

If you can send me the NN, JN, ZK logs; more than happy to look into it.
>
I can yes, I just need time to anonymize everything.

Thanks again for your help.

Best regards.

T@le



Le jeu. 24 févr. 2022 à 21:28, gurmukh singh <[email protected]> a
écrit :

> Also, as you are using hive/beeline. You can fetch all the config as:
>
> beeline -u "JDBC URL to connect to HS2 " --outputformat=tsv2 -e 'set -v' >
> /tmp/BeelineSet.out
>
> Please attach the BeelineSet.out
>
> On Friday, 25 February, 2022, 07:15:51 am GMT+11, gurmukh singh
> <[email protected]> wrote:
>
>
> on ZK side
>
> Important things:
>
> Disk latency
> Heap
> maxClientCnxns=800 (At least) As you have 600 nodes, so you are expecting
> a high job workload)
> jute.maxbuffer=1GB (By default it is very low, especially in a kerberozied
> env, it must be bumped up). This setting is not there in HDP by default,
> you have to put under custom-zoo.cfg
>
>
> If you can send me the NN, JN, ZK logs; more than happy to look into it.
>
>
>
> On Friday, 25 February, 2022, 06:59:17 am GMT+11, gurmukh singh
> <[email protected]> wrote:
>
>
> @Tale Hive you provided the details in the first email, missed it.
>
> Can you provide me the output of below from Namenode:
>
> $ sysctl -n net.core.somaxconn
>
> $ sysctl -n net.ipv4.tcp_max_syn_backlog
>
> $ sysctl -n net.core.netdev_max_backlog
>
> $ netstat -an | grep -c SYN_RECV
>
> $ netstat -an | egrep -v "MYIP.(PORTS|IN|LISTEN)"  | wc -l
>
>
> What do you see in the JN logs? and what about ZK logs?
> any logs in NN, ZK on the lines of "Slow sync'
> What is the ZK heap?
>
>
>
> On Friday, 25 February, 2022, 06:42:31 am GMT+11, gurmukh singh <
> [email protected]> wrote:
>
>
> I checked the heap of the namenode and there is no problem (I have 75 GB
> of max heap, I'm around 50 used GB).
>
>     Why 75GB heap size for NN? are you running a very large cluster?
>     50 GB of heap used? Can you check are talking about the NN heap itself
> or are you saying about the total mem used on the server?
>     50GB approx means 200 million blocks? do you have that many.
>
> I checked the threads of the clientRPC for the namenode and I'm at 200
> which respects the recommandations from hadoop operations book.
>     The formula is 20 X log base2(n); where n is the number of nodes.
>     So, if you have a thousand nodes we keep it to 200 (20 X
> log2(1024)=200) and then approx 20 threads per thousand nodes.
>
> I have serviceRPC enabled to prevent any problem which could be coming
> from datanodes or ZKFC.
>
>
> On Thursday, 24 February, 2022, 12:19:51 am GMT+11, Tale Hive <
> [email protected]> wrote:
>
>
> Hello.
>
> According to what I saw this morning, I can see that I am in the first
> situation in fact :
>
>    - Client sends one packet with flag SYN to namenode
>    - Namenode sends one packet with flags SYN, ACK to the client
>    - Client sends n packets with flags PSH, ACK to the namenode, for each
>    subfolder
>    - Namenode sends n packets PSH, ACK to the client, for the content of
>    each subfolder
>
> So the number of (PSH, ACK) packets from the tcpdump pcap file is not what
> is filling the accept queue of port 8020 ClientRPC server on Namenode.
>
> I'm going to focus on checking the packets with SYN flag which arrive to
> the namenode.
> After that, because the jstack provokes active namenode failover, I don't
> have many more tracks to follow except increase again the listenQueue, to
> mitigate the problem, not to solve it.
>
> Best regards.
>
> T@le
>
>
>
> Le mer. 23 févr. 2022 à 11:46, Tale Hive <[email protected]> a écrit :
>
> Hello guys.
>
> Still investigating the tcpdump. I don't see a lot of packets with the
> flag SYN when the listenQueue is full.
> What I see is a lot of packets with the flag "PSH, ACK" with data inside
> like this :
> getListing.org.apache.hadoop.hdfs.protocol.ClientProtocol
> /apps/hive/warehouse/<mydb>.db/<mytable>/<mypartition>
>
> It makes me wonder, when a client perform an hdfs dfs -ls -R <HDFS_PATH>,
> how many SYN packets will it send to the namenode ? One in total or one by
> subfolder ?
> Let's say I have "n" subfolders inside <HDFS_PATH>. Will we have this
> situation :
> - Client sends one SYN packet to Namenode
> - Namenode sends one SYN-ACK packets to client
> - Client sends n ACK or (PSH, ACK) packets to Namenode
>
> Or this situation :
> - Client sends n SYN packet to Namenode
> - Namenode sends n SYN-ACK packets to client
> - Client sends n ACK or (PSH, ACK)
>
> It would mean an hdfs recursive listing on a path with a lot of subfolders
> could harm the other clients by sending too many packets to the namenode ?
>
> About the jstack, I tried it on the namenode JVM but it provoked a
> failover, as the namenode was not answering at all (in particular, no
> answer to ZKFC), and the jstack never ended, I had to kill it.
> I don't know if a kill -3 or a jstack -F could help, but at least jstack
> -F contains less valuable information.
>
> T@le
>
> Le mar. 22 févr. 2022 à 10:29, Amith sha <[email protected]> a écrit :
>
> If TCP error occurs then you need to check the network metrics. Yes, TCP
> DUMP can help you.
>
>
> Thanks & Regards
> Amithsha
>
>
> On Tue, Feb 22, 2022 at 1:29 PM Tale Hive <[email protected]> wrote:
>
> Hello !
>
> @Amith sha <[email protected]>
> I checked also the system metrics, nothing wrong in CPU, RAM or IO.
> The only thing I found was these TCP errors (ListenDrop).
>
> @HK
> I'm monitoring a lot of JVM metrics like this one :
> "UnderReplicatedBlocks" in the bean
> "Hadoop:service=NameNode,name=FSNamesystem".
> And I found no under replicated blocks when the problem of timeout occurs,
> unfortunately.
> Thanks for you advice, in addition to the tcpdump, I'll perform some
> jstacks to see if I can find what ipc handlers are doing.
>
> Best regards.
>
> T@le
>
>
>
>
>
>
> Le mar. 22 févr. 2022 à 04:30, HK <[email protected]> a écrit :
>
> Hi Tape,
> Could you please thread dump of namenode process. Could you please check
> what ipc handlers are doing.
>
> We faced similar issue when the under replication is high in the cluster
> due to filesystem wirteLock.
>
> On Tue, 22 Feb 2022, 8:37 am Amith sha, <[email protected]> wrote:
>
> Check your system metrics too.
>
> On Mon, Feb 21, 2022, 10:52 PM Tale Hive <[email protected]> wrote:
>
> Yeah, next step is for me to perform a tcpdump just when the problem
> occurs.
> I want to know if my namenode does not accept connections because it
> freezes for some reasons or because there is too many connections at a time.
>
> My delay if far worse than 2s, sometimes, an hdfs dfs -ls -d
> /user/<my-user> takes 20s, 43s and rarely it is even bigger than 1 minut.
> And during this time, CallQueue is OK, Heap is OK, I don't find any
> metrics which could show me a problem inside the namenode JVM.
>
> Best regards.
>
> T@le
>
> Le lun. 21 févr. 2022 à 16:32, Amith sha <[email protected]> a écrit :
>
> If you still concerned about the delay of > 2 s then you need to do
> benchmark with and without load. To find the root cause of the problem it
> will help.
>
> On Mon, Feb 21, 2022, 1:52 PM Tale Hive <[email protected]> wrote:
>
> Hello Amith.
>
> Hm, not a bad idea. If I increase the size of the listenQueue and if I
> increase timeout, the combination of both may mitigate more the problem
> than just increasing listenQueue size.
> It won't solve the problem of acceptance speed, but it could help.
>
> Thanks for the suggestion !
>
> T@le
>
> Le lun. 21 févr. 2022 à 02:33, Amith sha <[email protected]> a écrit :
>
> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
> waiting for channel to be ready for connect.
> Connection timed out after 20000 milli sec i suspect this value is very
> low for a namenode with 75Gb of heap usage. Can you increase the value to
> 5sec and check the connection. To increase the value modify this property
> ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the
> core-site.xml)
>
>
> Thanks & Regards
> Amithsha
>
>
> On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote:
>
> Hello Tom.
>
> Sorry for my absence of answers, I don't know why gmail puts your mail
> into spam -_-.
>
> To answer you :
>
>    - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC
>    metric are all OK
>    - Threads are plenty sufficient (I can see the metrics also for them
>    and I  am below 200, the number I have for 8020 RPC server)
>
> Did you see my other answers about this problem ?
> I would be interested to have your opinion about that !
>
> Best regards.
>
> T@le
>
>
> Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit :
>
> It might be helpful to analyze namenode metrics and logs.
>
> What about some key metrics? Examples are callQueueLength, avgQueueTime,
> avgProcessingTime and GC metrics.
>
> In addition, is the number of threads(dfs.namenode.service.handler.count)
> in the namenode sufficient?
>
> Hopefully this will help.
>
> Best regards.
> Tom
>
> Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道:
>
> Hello.
>
> I encounter a strange problem with my namenode. I have the following
> architecture :
> - Two namenodes in HA
> - 600 datanodes
> - HDP 3.1.4
> - 150 millions of files and folders
>
> Sometimes, when I query the namenode with the hdfs client, I got a timeout
> error like this :
> hdfs dfs -ls -d /user/myuser
>
> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler:
> org.apache.hadoop.net.ConnectTimeoutException: Call From
> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020
> failed on socket timeout exception:
>   org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout
> while waiting for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending
> remote=<active-namenode-hostname>/<active-namenode-ip>:8020];
>   For more details see:  http://wiki.apache.org/hadoop/SocketTimeout,
> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over
> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover
> attempts. Trying to failover after sleeping for 2694ms.
>
> I checked the heap of the namenode and there is no problem (I have 75 GB
> of max heap, I'm around 50 used GB).
> I checked the threads of the clientRPC for the namenode and I'm at 200
> which respects the recommandations from hadoop operations book.
> I have serviceRPC enabled to prevent any problem which could be coming
> from datanodes or ZKFC.
> General resources seems OK, CPU usage is pretty fine, same for memory,
> network or IO.
> No firewall is enabled on my namenodes nor my client.
>
> I was wondering what could cause this problem, please ?
>
> Thank you in advance for your help !
>
> Best regards.
>
> T@le
>
>

Reply via email to