Hello Gurmuck Singh. Thank you for your answers.
Why 75GB heap size for NN? are you running a very large cluster? > 50 GB of heap used? Can you check are talking about the NN heap itself or > are you saying about the total mem used on the server? > 50GB approx means 200 million blocks? do you have that many. > I have ~150 millions of blocks/files and I set up this heap following the recommandations here : https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html The formula is 20 X log base2(n); where n is the number of nodes. > So, if you have a thousand nodes we keep it to 200 (20 X log2(1024)=200) > and then approx 20 threads per thousand nodes. > I have 600 datanodes, which makes me normally at 20 * log2'(600) = 185 threads for the ClientRPC server (the one which listens on port 8020) $ sysctl -n net.core.somaxconn > > $ sysctl -n net.ipv4.tcp_max_syn_backlog > > $ sysctl -n net.core.netdev_max_backlog > net.core.somaxconn= 8432 net.ipv4.tcp_max_syn_backlog = 4096 net.core.netdev_max_backlog = 2000 $ netstat -an | grep -c SYN_RECV > $ netstat -an | egrep -v "MYIP.(PORTS|IN|LISTEN)" | wc -l > I'll check again and get you more information. What do you see in the JN logs? and what about ZK logs? > any logs in NN, ZK on the lines of "Slow sync' > Didn't check these logs, going to check them and get back to you. > What is the ZK heap? > Zookeeper heap is 4 GB. Disk latency > Heap > maxClientCnxns=800 (At least) As you have 600 nodes, so you are expecting > a high job workload) > jute.maxbuffer=1GB (By default it is very low, especially in a kerberozied > env, it must be bumped up). This setting is not there in HDP by default, > you have to put under custom-zoo.cfg > I'm going to check this also. If you can send me the NN, JN, ZK logs; more than happy to look into it. > I can yes, I just need time to anonymize everything. Thanks again for your help. Best regards. T@le Le jeu. 24 févr. 2022 à 21:28, gurmukh singh <[email protected]> a écrit : > Also, as you are using hive/beeline. You can fetch all the config as: > > beeline -u "JDBC URL to connect to HS2 " --outputformat=tsv2 -e 'set -v' > > /tmp/BeelineSet.out > > Please attach the BeelineSet.out > > On Friday, 25 February, 2022, 07:15:51 am GMT+11, gurmukh singh > <[email protected]> wrote: > > > on ZK side > > Important things: > > Disk latency > Heap > maxClientCnxns=800 (At least) As you have 600 nodes, so you are expecting > a high job workload) > jute.maxbuffer=1GB (By default it is very low, especially in a kerberozied > env, it must be bumped up). This setting is not there in HDP by default, > you have to put under custom-zoo.cfg > > > If you can send me the NN, JN, ZK logs; more than happy to look into it. > > > > On Friday, 25 February, 2022, 06:59:17 am GMT+11, gurmukh singh > <[email protected]> wrote: > > > @Tale Hive you provided the details in the first email, missed it. > > Can you provide me the output of below from Namenode: > > $ sysctl -n net.core.somaxconn > > $ sysctl -n net.ipv4.tcp_max_syn_backlog > > $ sysctl -n net.core.netdev_max_backlog > > $ netstat -an | grep -c SYN_RECV > > $ netstat -an | egrep -v "MYIP.(PORTS|IN|LISTEN)" | wc -l > > > What do you see in the JN logs? and what about ZK logs? > any logs in NN, ZK on the lines of "Slow sync' > What is the ZK heap? > > > > On Friday, 25 February, 2022, 06:42:31 am GMT+11, gurmukh singh < > [email protected]> wrote: > > > I checked the heap of the namenode and there is no problem (I have 75 GB > of max heap, I'm around 50 used GB). > > Why 75GB heap size for NN? are you running a very large cluster? > 50 GB of heap used? Can you check are talking about the NN heap itself > or are you saying about the total mem used on the server? > 50GB approx means 200 million blocks? do you have that many. > > I checked the threads of the clientRPC for the namenode and I'm at 200 > which respects the recommandations from hadoop operations book. > The formula is 20 X log base2(n); where n is the number of nodes. > So, if you have a thousand nodes we keep it to 200 (20 X > log2(1024)=200) and then approx 20 threads per thousand nodes. > > I have serviceRPC enabled to prevent any problem which could be coming > from datanodes or ZKFC. > > > On Thursday, 24 February, 2022, 12:19:51 am GMT+11, Tale Hive < > [email protected]> wrote: > > > Hello. > > According to what I saw this morning, I can see that I am in the first > situation in fact : > > - Client sends one packet with flag SYN to namenode > - Namenode sends one packet with flags SYN, ACK to the client > - Client sends n packets with flags PSH, ACK to the namenode, for each > subfolder > - Namenode sends n packets PSH, ACK to the client, for the content of > each subfolder > > So the number of (PSH, ACK) packets from the tcpdump pcap file is not what > is filling the accept queue of port 8020 ClientRPC server on Namenode. > > I'm going to focus on checking the packets with SYN flag which arrive to > the namenode. > After that, because the jstack provokes active namenode failover, I don't > have many more tracks to follow except increase again the listenQueue, to > mitigate the problem, not to solve it. > > Best regards. > > T@le > > > > Le mer. 23 févr. 2022 à 11:46, Tale Hive <[email protected]> a écrit : > > Hello guys. > > Still investigating the tcpdump. I don't see a lot of packets with the > flag SYN when the listenQueue is full. > What I see is a lot of packets with the flag "PSH, ACK" with data inside > like this : > getListing.org.apache.hadoop.hdfs.protocol.ClientProtocol > /apps/hive/warehouse/<mydb>.db/<mytable>/<mypartition> > > It makes me wonder, when a client perform an hdfs dfs -ls -R <HDFS_PATH>, > how many SYN packets will it send to the namenode ? One in total or one by > subfolder ? > Let's say I have "n" subfolders inside <HDFS_PATH>. Will we have this > situation : > - Client sends one SYN packet to Namenode > - Namenode sends one SYN-ACK packets to client > - Client sends n ACK or (PSH, ACK) packets to Namenode > > Or this situation : > - Client sends n SYN packet to Namenode > - Namenode sends n SYN-ACK packets to client > - Client sends n ACK or (PSH, ACK) > > It would mean an hdfs recursive listing on a path with a lot of subfolders > could harm the other clients by sending too many packets to the namenode ? > > About the jstack, I tried it on the namenode JVM but it provoked a > failover, as the namenode was not answering at all (in particular, no > answer to ZKFC), and the jstack never ended, I had to kill it. > I don't know if a kill -3 or a jstack -F could help, but at least jstack > -F contains less valuable information. > > T@le > > Le mar. 22 févr. 2022 à 10:29, Amith sha <[email protected]> a écrit : > > If TCP error occurs then you need to check the network metrics. Yes, TCP > DUMP can help you. > > > Thanks & Regards > Amithsha > > > On Tue, Feb 22, 2022 at 1:29 PM Tale Hive <[email protected]> wrote: > > Hello ! > > @Amith sha <[email protected]> > I checked also the system metrics, nothing wrong in CPU, RAM or IO. > The only thing I found was these TCP errors (ListenDrop). > > @HK > I'm monitoring a lot of JVM metrics like this one : > "UnderReplicatedBlocks" in the bean > "Hadoop:service=NameNode,name=FSNamesystem". > And I found no under replicated blocks when the problem of timeout occurs, > unfortunately. > Thanks for you advice, in addition to the tcpdump, I'll perform some > jstacks to see if I can find what ipc handlers are doing. > > Best regards. > > T@le > > > > > > > Le mar. 22 févr. 2022 à 04:30, HK <[email protected]> a écrit : > > Hi Tape, > Could you please thread dump of namenode process. Could you please check > what ipc handlers are doing. > > We faced similar issue when the under replication is high in the cluster > due to filesystem wirteLock. > > On Tue, 22 Feb 2022, 8:37 am Amith sha, <[email protected]> wrote: > > Check your system metrics too. > > On Mon, Feb 21, 2022, 10:52 PM Tale Hive <[email protected]> wrote: > > Yeah, next step is for me to perform a tcpdump just when the problem > occurs. > I want to know if my namenode does not accept connections because it > freezes for some reasons or because there is too many connections at a time. > > My delay if far worse than 2s, sometimes, an hdfs dfs -ls -d > /user/<my-user> takes 20s, 43s and rarely it is even bigger than 1 minut. > And during this time, CallQueue is OK, Heap is OK, I don't find any > metrics which could show me a problem inside the namenode JVM. > > Best regards. > > T@le > > Le lun. 21 févr. 2022 à 16:32, Amith sha <[email protected]> a écrit : > > If you still concerned about the delay of > 2 s then you need to do > benchmark with and without load. To find the root cause of the problem it > will help. > > On Mon, Feb 21, 2022, 1:52 PM Tale Hive <[email protected]> wrote: > > Hello Amith. > > Hm, not a bad idea. If I increase the size of the listenQueue and if I > increase timeout, the combination of both may mitigate more the problem > than just increasing listenQueue size. > It won't solve the problem of acceptance speed, but it could help. > > Thanks for the suggestion ! > > T@le > > Le lun. 21 févr. 2022 à 02:33, Amith sha <[email protected]> a écrit : > > org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while > waiting for channel to be ready for connect. > Connection timed out after 20000 milli sec i suspect this value is very > low for a namenode with 75Gb of heap usage. Can you increase the value to > 5sec and check the connection. To increase the value modify this property > ipc.client.rpc-timeout.ms - core-site.xml (If not present then add to the > core-site.xml) > > > Thanks & Regards > Amithsha > > > On Fri, Feb 18, 2022 at 9:17 PM Tale Hive <[email protected]> wrote: > > Hello Tom. > > Sorry for my absence of answers, I don't know why gmail puts your mail > into spam -_-. > > To answer you : > > - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC > metric are all OK > - Threads are plenty sufficient (I can see the metrics also for them > and I am below 200, the number I have for 8020 RPC server) > > Did you see my other answers about this problem ? > I would be interested to have your opinion about that ! > > Best regards. > > T@le > > > Le mar. 15 févr. 2022 à 02:16, tom lee <[email protected]> a écrit : > > It might be helpful to analyze namenode metrics and logs. > > What about some key metrics? Examples are callQueueLength, avgQueueTime, > avgProcessingTime and GC metrics. > > In addition, is the number of threads(dfs.namenode.service.handler.count) > in the namenode sufficient? > > Hopefully this will help. > > Best regards. > Tom > > Tale Hive <[email protected]> 于2022年2月14日周一 23:57写道: > > Hello. > > I encounter a strange problem with my namenode. I have the following > architecture : > - Two namenodes in HA > - 600 datanodes > - HDP 3.1.4 > - 150 millions of files and folders > > Sometimes, when I query the namenode with the hdfs client, I got a timeout > error like this : > hdfs dfs -ls -d /user/myuser > > 22/02/14 15:07:44 INFO retry.RetryInvocationHandler: > org.apache.hadoop.net.ConnectTimeoutException: Call From > <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 > failed on socket timeout exception: > org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout > while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; > For more details see: http://wiki.apache.org/hadoop/SocketTimeout, > while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over > <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover > attempts. Trying to failover after sleeping for 2694ms. > > I checked the heap of the namenode and there is no problem (I have 75 GB > of max heap, I'm around 50 used GB). > I checked the threads of the clientRPC for the namenode and I'm at 200 > which respects the recommandations from hadoop operations book. > I have serviceRPC enabled to prevent any problem which could be coming > from datanodes or ZKFC. > General resources seems OK, CPU usage is pretty fine, same for memory, > network or IO. > No firewall is enabled on my namenodes nor my client. > > I was wondering what could cause this problem, please ? > > Thank you in advance for your help ! > > Best regards. > > T@le > >
