Re: YARN - How is a node for a container determined?

Grant Overby Tue, 29 Aug 2017 18:31:59 -0700

Most of the applications are twill apps and are some what long running, but
not perpetual, a few hours to a day. Many of the apps (say about half) have
a lot of idle time. These apps come from across the enterprise, Idk why
they're idle. There are also a few MR, TEZ, and Spark apps in the mix.

If we don't over commit (or modestly over commit) vCores and memory, then
having the apps stacked on fewer boxes isn't that much of an impact when
compared to spread across more nodes.

The number of applications running at a given time is very fluid. We have
an elastic infrastructure that can add and remove YARN VMs as needed. It
needs bit of scripting to be "YARN aware," but that part isn't rocket
surgery.

Removing those VMs after the work load drops is hampered when the
containers are spread out. Consider a situation where a node is only
running 1 container and is 10% utilized, I wouldn't want a new container
landing on this node as it's ripe for being removed.

I also prefer to place containers on nodes that are also HDFS data nodes so
that they can benefit from the locality. It's not desirable for a new
container to land on an overflow VM if there are resources on a proper node.

As a stretch goal, I also may want to prioritize some jobs to the proper
nodes and others to the VMs in the future.

I don't want to preempt containers and make them restart on another node.

There's a preferred node feature which I think I can beat into submission,
but I'd much rather adjust a value or plug in a proper, new algo.

PS: Sorry Philippe. I hit reply instead of reply all, so you'll
unfortunately get spammed with two copies of this.

On Tue, Aug 29, 2017 at 6:12 AM, Philippe Kernévez <[email protected]>
wrote:

> " densely pack containers on fewer nodes" : quite surprising, +1 with
> Daemon
>
> You have Yarn labels that can be used for that.
> Classical example are the need of specific hardware fir some processing.
> https://hadoop.apache.org/docs/stable/hadoop-yarn/
> hadoop-yarn-site/NodeLabel.html
>
> Regards,
> Philippe
>
> On Tue, Aug 29, 2017 at 12:53 AM, daemeon reiydelle <[email protected]>
> wrote:
>
>> Perhaps you can go into a bit more detail? Especially for e.g. a map job
>> (or reduce in mapR), this seems like a major antipattern.
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198
>> <(415)%20501-0198>London 44 020 8144 9872*
>>
>>
>> On Mon, Aug 28, 2017 at 3:37 PM, Grant Overby <[email protected]>
>> wrote:
>>
>>> When YARN receives a request for a container, which can be met by many
>>> nodes, what is the algorithm for determining which node is given the
>>> container?
>>>
>>> Is this tunable? I'd like to densely pack containers on fewer nodes.
>>>
>>> A pointer to source code would be nice.
>>>
>>>
>>
>
>
> --
> Philippe Kernévez
>
>
>
> Directeur technique (Suisse),
> [email protected]
> +41 79 888 33 32 <+41%2079%20888%2033%2032>
>
> Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
> OCTO Technology http://www.octo.ch
>

Re: YARN - How is a node for a container determined?

Reply via email to