Re: HDFS Balancer - Recommended Threshold Value

Mingliang Liu Thu, 11 Aug 2016 12:05:51 -0700

This email thread should go to user@ which is for end-user questions and 
discussion, instead of hdfs-dev@

My 2 cents:
> The original design of Balancer is intentionally to make it run slowly so 
> that the balancing activities won't affect the normal cluster activities and 
> the running jobs.
The limit of maximum size of data that the Balancer will move between a chosen 
datanode pair is 10GB. This is not configurable however in 2.4 stack. Please 
refer to 
https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

Thanks.

L

On Aug 11, 2016, at 3:21 AM, Senthil Kumar 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Lars for your quick response!

Here is my Cluster Utilization..
DFS Used% : 74.39%
DFS Remaining% : 25.60%

Block Pool Used% : 74.39%
DataNodes usages : Min % Median % Max % stdev %
1.25% 99.72% 99.99% 22.53%
Hadoop Version : *2.4.1*

Let's take an example :

Cluster  Live Nodes           :    1000
Capacity Used 95-99%      :      700
Capacity Used 90 -95 %    :       50
Capacity Used  < 90 %     :      250

I'm looking for an option to balance the data quickly from the nodes
category 90-95% to < 90% nodes category.. I know there is an option like
-include  & -exclude but it's not helping me ( or am i not using it
effectively ??  Pls advise here how to use these options properly if i want
to balance my cluster as described above ) .

Is there any option like --force-balance ( include two other inputs like
force-balance-source-hosts(file) & force-balance-dest-hosts(file) ).. this
way i believe we can achieve balancing in urgency mode when you have 90% of
nodes hitting  99% disk usage or when we have median 95% and above .. Pls
add your thoughts here ..

Here is the code that constructs the NW Topology by categorizing like
over-utilized , avg utilized and under-utilized .. Sometimes i could see
nodes with 70% of usage also comes under over-utilized ( tried with
threshold 10 - 30 ) . Correct me if anything wrong in my understanding.

https://github.com/apache/hadoop/tree/release-2.4.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer

*/*create network topology and all data node lists: *
*     * overloaded, above-average, below-average, and underloaded*
*     * we alternates the accessing of the given datanodes array either by*
*     * an increasing order or a decreasing order.*
*     */  *
*    long overLoadedBytes = 0L, underLoadedBytes = 0L;*
*    for (DatanodeInfo datanode : DFSUtil.shuffle(datanodes)) {*
*      if (datanode.isDecommissioned() ||
datanode.isDecommissionInProgress()) {*
*        continue; // ignore decommissioning or decommissioned nodes*
*      }*
*      cluster.add(datanode);*
*      BalancerDatanode datanodeS;*
*      final double avg = policy.getAvgUtilization();*
*      if (policy.getUtilization(datanode) > avg) {*
*        datanodeS = new Source(datanode, policy, threshold);*
*        if (isAboveAvgUtilized(datanodeS)) {*
*          this.aboveAvgUtilizedDatanodes.add((Source)datanodeS);*
*        } else {*
*          assert(isOverUtilized(datanodeS)) :*
*            datanodeS.getDisplayName()+ "is not an overUtilized node";*
*          this.overUtilizedDatanodes.add((Source)datanodeS);*
*          overLoadedBytes += (long)((datanodeS.utilization-avg*
*              -threshold)*datanodeS.datanode.getCapacity()/100.0);*
*        }*
*      } else {*
*        datanodeS = new BalancerDatanode(datanode, policy, threshold);*
*        if ( isBelowOrEqualAvgUtilized(datanodeS)) {*
*          this.belowAvgUtilizedDatanodes.add(datanodeS);*
*        } else {*
*          assert isUnderUtilized(datanodeS) : "isUnderUtilized("*
*              + datanodeS.getDisplayName() + ")=" +
isUnderUtilized(datanodeS)*
*              + ", utilization=" + datanodeS.utilization; *
*          this.underUtilizedDatanodes.add(datanodeS);*
*          underLoadedBytes += (long)((avg-threshold-*
*
datanodeS.utilization)*datanodeS.datanode.getCapacity()/100.0);*
*        }*
*      }*
*      datanodeMap.put(datanode.getDatanodeUuid(), datanodeS);*
*    }*

Could someone help me here to understand the balancing policy and what are
the different parameters should i use to balance ( bring down median )
cluster ??

--Senthil

On Wed, Aug 10, 2016 at 8:21 PM, Lars Francke <[email protected]>
wrote:

Hi Senthil,

I'm not sure I fully understand.

If you're using a threshold of 30 that means you have a range of 60% that
the balancer would consider to be okay.

Example: The used space divided by your total available space in the
cluster is 80% Then with a 30% threshold the balancer would try to bring
all nodes within the range of 50-100% utilisation.

The default threshold is 10% and that's a fairly huge range still
especially on clusters that are almost at capacity. So a threshold of 5 or
even lower might work for you.

What is your utilisation in the cluster (used space / available space)?

Cheers,
Lars

On Wed, Aug 10, 2016 at 3:27 PM, Senthil Kumar <[email protected]>
wrote:

Hi Team ,  We are running big cluster ( 3000 nodes cluster ) , many time
we
are hitting  Median Increasing to 99.99 % ( 80 % of the DN's ) .  Balancer
is running all time in cluster ..But still  median is not coming down i.e
<
90 % ..

Here is how i start balancer ?
/apache/hadoop/sbin/start-balancer.sh
-Ddfs.balance.bandwidthPerSec=104857600  *-threshold  30*

What the recommended value for thershold ??  Is there any way to pass
param
only to move blocks from Over Utilized ( 98-100%) to under utilized ?

Pls advise!

Regards,
Senthil

Re: HDFS Balancer - Recommended Threshold Value

Reply via email to