Thanks Zian. I am a bit confused now.

When I posted the question on Hortonworks community, I had a different
opinion from another community member (I presume Hortonworks employee)

https://community.hortonworks.com/questions/192228/capacity-scheduler-maximum-capacity.html?childToView=192230#answer-192230

Regarding your last point on having an explicit maximum value for each
queue, I still don't understand why this needs to be specified if the
“general" recommendation is to have it at 100%, but change it only if the
use case is different. Given your earlier answer, my understanding is that
setting this to 100% for all queues should ideally cause no harm, given the
other set of parameters that come into play when multiple queues are
involved.

Please correct me if I am wrong.

Thanks

On 22 May 2018 at 00:03:04, Zian Chen ([email protected]) wrote:

Hi Greenhorn,

Actually is the latter. When specifying the percentage, it means the upper
bound of the percentage of the resource this queue can gather from its
immediate parent queue. For example the queue hierarchy like below,

root
/  \
      a    b
     / \
   a1 a2

If we set max-cap of a1 to 100% it means a1 can at most gets 100% of the
resource from its immediate parent a, not from root.

More importantly, there are many factors to be used to limit/ control how
many resources a queue can get inside a cluster, like guaranteed resource,
user-limit, node labels, etc. Sometimes applications in a queue can get
more resource beyond its guaranteed resource when the cluster has idle
resource available to support elasticity. But max-cap is always a hard
limit for at most how many resources a queue can get.

So there are no default values for any queue with max-cap, we need this
parameter to set this hard-limit.

Hope this helps.
Thanks

On May 21, 2018, at 2:51 PM, Greenhorn Techie <[email protected]>
wrote:

Thanks Zian. Is maximum capacity a global value i.e. whenever I specify a
percentage here, does it take from the overall cluster’s capacity or is it
only from the parent queue? I thought its the former.

Also, if setting to 100% doesn't cause any harm, why is it explicitly
mentioned as a parameter instead of a default / implied value for any queue?

Thanks


On 21 May 2018 at 22:19:05, Zian Chen ([email protected]) wrote:

In my humble opinion,  it’s safe to set maximum capacity to 100% for each
queue, cause here the value is indicating how much percentage the queue can
have in its max capacity from its parent queue, so make the upper limit to
100% won’t cause hidden danger here.


On May 21, 2018, at 9:04 AM, Greenhorn Techie <[email protected]>
wrote:

Hi,

In our setup, we are using YARN Capacity Scheduler and have many queues
setup in a hierarchical fashion with a well configured minimum capacities.
However, wondering what is the best practice for setting maximum capacity
value i.e. for the parameter
*yarn.scheduler.capacity.<queue-path>.maximum-capacity*?

Is it advisable to have each queue configured with a maximum capacity of
100% or something like 90 to 95% with some leeway for the default queue? In
summary, what are the best practices to leverage maximum cluster capacity
while its available while honouring the minimum queue capacities?

Thanks

Reply via email to