Re: [slurm-users] [External] Submitting to multiple paritions problem with gres specified

Prentice Bisbal Fri, 12 Mar 2021 13:31:39 -0800

On 3/9/21 3:16 AM, Ward Poelmans wrote:

Hi Prentice,


On 8/03/2021 22:02, Prentice Bisbal wrote:

I have a very hetergeneous cluster with several different generations of
AMD and Intel processors, we use this method quite effectively.

Could you elaborate a bit more and how you manage that? Do you force you
users to pick a feature? What if a user submits a multi node job, can
you make sure it will not start over a mix of avx512 and avx2 nodes?

I don't force the users to pick a feature, and to make matters worse, Ithink our login nodes are newer than some of the compute nodes, so it'sentirely possible that if someone really optimizes their code for one ofthe login nodes, their job could get assigned to a node that doesn'tunderstand the instruction set, resulting in the dreaded "IllegalInstruction" error. Suprisingly, this has only happened a few times inthe 5 years I've been at this job.

I assume most users would want to use the newest and fastest processorsif given the choice, so I set the priority weighting of the nodes sothat the newest nodes are highest priority, and the oldest nodes thelowest priority.

The only way to make sure the processors stick to a certain instructionset, is if they specify the processor model, rather then than theinstruction set family. For example


-C 7281 will get you only AMD EPYC 7281 processors

and

-C 6376 will get you only AMD Opteron 6376 processors

Using your example, if you don't want to mix AVX2 and AVX512 processorsin the same job ever, you can "lie" to Slurm in your topology file andcome up with a topology where the two subsets of nodes can't talk toeach other. That way, Slurm will not mix nodes of the differentinstruction sets. The problem with this is that it's a "permanent"solution - it's not flexible. I would imagine there are times when youwould want to use both your AVX2 and AVX512 processors in a single job.

I do something like this because we have 10 nodes set aside for serialjobs that are connected only by 1 GbE. We obviously don't want internodejobs running there, so in my topology file, each of those nodes has it'sown switch that's not connected to any other switch.

If you want to continue down the road you've already started on, can you
provide more information, like the partition definitions and the gres
definitions? In general, Slurm should support submitting to multiple
partitions.

As far as I understood it, you can give a comma separated list of
partitions to sbatch but it's not possible to this by default?

Incorrect. Giving a comma separated list is possible and is the defaultbehavior for Slurm. From the sbatch documentation (emphasis added to therelevant sentence):

*-p*, *--partition*=</partition_names/>
    Request a specific partition for the resource allocation. If not
    specified, the default behavior is to allow the slurm controller
    to select the default partition as designated by the system
    administrator. *If the job can use more than one partition,
    specify their names in a comma separate list and the one offering
    earliest initiation will be used with no regard given to the
    partition name ordering (although higher priority partitions will
    be considered first).* When the job is initiated, the name of the
    partition used will be placed first in the job record partition
    string.

But you can't have a job *span* multiple partitions, but I don't thinkthat was ever your goal.



Prentice

Re: [slurm-users] [External] Submitting to multiple paritions problem with gres specified

Reply via email to