Thanks Rakesh . "*Perhaps there could be high chance of searching for data blocks which it can move around to balance the cluster*. "
I could see below log statement after enabling DEBUG mode.. 2016-09-08 06:32:06,574 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49230 2016-09-08 06:32:06,574 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49231 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49229 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 1ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49230 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49232 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 1ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49233 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49231 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49234 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49232 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN sending #49235 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49233 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getBlocks took 0ms 2016-09-08 06:32:06,575 DEBUG org.apache.hadoop.ipc.Client: IPC Client (685788708) connection to nn-host/10.103.108. 201:8020 from hadoop/host@HOST_DOMAIN got value #49234 Same getBlocks() call repeating !!! --Senthil On Thu, Sep 8, 2016 at 7:46 PM, Rakesh Radhakrishnan <[email protected]> wrote: > Have you taken multiple thread dumps (jstack) and observed the operations > which are performing during this period of time. Perhaps there could be > high chance of searching for data blocks which it can move around to > balance the cluster. > > Could you tell me the used space and available space values. Have you > tried changing the threshold to a lower value, may be 10 or 5 and what > happens with this value. Also, I think there is no log messages during 15 > mins time period, any possibility of enabling debug log priority and try to > dig more about the problem. > > > Rakesh > > On Thu, Sep 8, 2016 at 7:44 PM, Rakesh Radhakrishnan <[email protected]> > wrote: > >> Have you taken multiple thread dumps (jstack) and observed the operations >> which are performing during this period of time. Perhaps there could be >> high chance of searching for data blocks which it can move around to >> balance the cluster. >> >> Could you tell me the used space and available space values. Have you >> tried changing the threshold to a lower value, may be 10 or 5 and what >> happens with this value. Also, I think there is no log messages during 15 >> mins time period, any possibility of enabling debug log priority and try to >> dig more about the problem. >> >> Rakesh >> >> On Thu, Sep 8, 2016 at 6:15 PM, Senthil Kumar <[email protected]> >> wrote: >> >>> Hi All , We are in the situation to balance the cluster data since >>> median >>> reached 98% .. I started balancer as below >>> >>> Hadoop Version: Hadoop 2.4.1 >>> >>> >>> /apache/hadoop/sbin/start-balancer.sh -threshold 30 >>> >>> >>> Once i start balancer it goes will for first 8-10 minutes of time.. >>> Balancer was moving so quickly first 10 minutes.. Not sure whats >>> happening >>> in the cluster after sometime ( say 10 minz ) , balancer is almost stuck >>> . >>> >>> Log excerpts : >>> >>> 2016-09-08 04:58:15,653 INFO >>> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved >>> blk_-5830766563502877304_1279767737 with size=134217728 from >>> 10.103.21.27:1004 to 10.142.21.56:1004 through 10.103.21.27:1004 >>> >>> 2016-09-08 04:59:14,426 INFO >>> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved >>> blk_2601479900_1104500421142 with size=268435456 from 10.103.84.51:1004 >>> to >>> 10.142.18.27:1004 through 10.103.84.16:1004 >>> >>> 2016-09-08 05:01:15,037 INFO >>> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved >>> blk_3073791211_1104972921837 with size=268435456 from 10.103.21.27:1004 >>> to >>> 10.142.21.56:1004 through 10.103.21.42:1004 >>> >>> >>> >>> [05:16]:[hadoop@lvsaishdc3sn0002:~]$ date >>> >>> Thu Sep 8 05:16:53 GMT+7 2016 >>> >>> [05:16]:[hadoop@lvsaishdc3sn0002:~]$ jps >>> >>> 1003 Balancer >>> >>> 20388 Jps >>> >>> >>> >>> Last Block Mover Timestamp : 05:01 >>> >>> Current Timestamp : 05:16 >>> >>> >>> Almost 15 minz no blocks moved by balancer .. What could be the issue >>> here >>> ?? Restart would help us start moving again.. >>> >>> >>> >>> It’s not event passing iteration 1 .. >>> >>> >>> I found one thread discussing about the same issue: >>> >>> http://lucene.472066.n3.nabble.com/A-question-about-Balancer >>> -in-HDFS-td4118505.html >>> >>> >>> Pls suggest here to balance cluster .. >>> >>> >>> --Senthil >>> >> >> >
