This email thread should go to user@ which is for end-user questions and discussion, instead of hdfs-dev@
My 2 cents: > The original design of Balancer is intentionally to make it run slowly so > that the balancing activities won't affect the normal cluster activities and > the running jobs. The limit of maximum size of data that the Balancer will move between a chosen datanode pair is 10GB. This is not configurable however in 2.4 stack. Please refer to https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html Thanks. L On Aug 11, 2016, at 3:21 AM, Senthil Kumar <[email protected]<mailto:[email protected]>> wrote: Thanks Lars for your quick response! Here is my Cluster Utilization.. DFS Used% : 74.39% DFS Remaining% : 25.60% Block Pool Used% : 74.39% DataNodes usages : Min % Median % Max % stdev % 1.25% 99.72% 99.99% 22.53% Hadoop Version : *2.4.1* Let's take an example : Cluster Live Nodes : 1000 Capacity Used 95-99% : 700 Capacity Used 90 -95 % : 50 Capacity Used < 90 % : 250 I'm looking for an option to balance the data quickly from the nodes category 90-95% to < 90% nodes category.. I know there is an option like -include & -exclude but it's not helping me ( or am i not using it effectively ?? Pls advise here how to use these options properly if i want to balance my cluster as described above ) . Is there any option like --force-balance ( include two other inputs like force-balance-source-hosts(file) & force-balance-dest-hosts(file) ).. this way i believe we can achieve balancing in urgency mode when you have 90% of nodes hitting 99% disk usage or when we have median 95% and above .. Pls add your thoughts here .. Here is the code that constructs the NW Topology by categorizing like over-utilized , avg utilized and under-utilized .. Sometimes i could see nodes with 70% of usage also comes under over-utilized ( tried with threshold 10 - 30 ) . Correct me if anything wrong in my understanding. https://github.com/apache/hadoop/tree/release-2.4.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer */*create network topology and all data node lists: * * * overloaded, above-average, below-average, and underloaded* * * we alternates the accessing of the given datanodes array either by* * * an increasing order or a decreasing order.* * */ * * long overLoadedBytes = 0L, underLoadedBytes = 0L;* * for (DatanodeInfo datanode : DFSUtil.shuffle(datanodes)) {* * if (datanode.isDecommissioned() || datanode.isDecommissionInProgress()) {* * continue; // ignore decommissioning or decommissioned nodes* * }* * cluster.add(datanode);* * BalancerDatanode datanodeS;* * final double avg = policy.getAvgUtilization();* * if (policy.getUtilization(datanode) > avg) {* * datanodeS = new Source(datanode, policy, threshold);* * if (isAboveAvgUtilized(datanodeS)) {* * this.aboveAvgUtilizedDatanodes.add((Source)datanodeS);* * } else {* * assert(isOverUtilized(datanodeS)) :* * datanodeS.getDisplayName()+ "is not an overUtilized node";* * this.overUtilizedDatanodes.add((Source)datanodeS);* * overLoadedBytes += (long)((datanodeS.utilization-avg* * -threshold)*datanodeS.datanode.getCapacity()/100.0);* * }* * } else {* * datanodeS = new BalancerDatanode(datanode, policy, threshold);* * if ( isBelowOrEqualAvgUtilized(datanodeS)) {* * this.belowAvgUtilizedDatanodes.add(datanodeS);* * } else {* * assert isUnderUtilized(datanodeS) : "isUnderUtilized("* * + datanodeS.getDisplayName() + ")=" + isUnderUtilized(datanodeS)* * + ", utilization=" + datanodeS.utilization; * * this.underUtilizedDatanodes.add(datanodeS);* * underLoadedBytes += (long)((avg-threshold-* * datanodeS.utilization)*datanodeS.datanode.getCapacity()/100.0);* * }* * }* * datanodeMap.put(datanode.getDatanodeUuid(), datanodeS);* * }* Could someone help me here to understand the balancing policy and what are the different parameters should i use to balance ( bring down median ) cluster ?? --Senthil On Wed, Aug 10, 2016 at 8:21 PM, Lars Francke <[email protected]> wrote: Hi Senthil, I'm not sure I fully understand. If you're using a threshold of 30 that means you have a range of 60% that the balancer would consider to be okay. Example: The used space divided by your total available space in the cluster is 80% Then with a 30% threshold the balancer would try to bring all nodes within the range of 50-100% utilisation. The default threshold is 10% and that's a fairly huge range still especially on clusters that are almost at capacity. So a threshold of 5 or even lower might work for you. What is your utilisation in the cluster (used space / available space)? Cheers, Lars On Wed, Aug 10, 2016 at 3:27 PM, Senthil Kumar <[email protected]> wrote: Hi Team , We are running big cluster ( 3000 nodes cluster ) , many time we are hitting Median Increasing to 99.99 % ( 80 % of the DN's ) . Balancer is running all time in cluster ..But still median is not coming down i.e < 90 % .. Here is how i start balancer ? /apache/hadoop/sbin/start-balancer.sh -Ddfs.balance.bandwidthPerSec=104857600 *-threshold 30* What the recommended value for thershold ?? Is there any way to pass param only to move blocks from Over Utilized ( 98-100%) to under utilized ? Pls advise! Regards, Senthil
