Re: [slurm-users] Limit number of jobs on shared nodes?

Paul Edmon Fri, 04 May 2018 07:06:52 -0700

You might try using Partition QoS's, those can do a bunch of neat features.


-Paul Edmon-


On 05/04/2018 09:59 AM, Liam Forbes wrote:

We have three "big memory" nodes. We'd like to limit the number ofjobs that run per node in two partitions that share these nodes. Jobsin these two partitions are limited to a single node max. We'd onlylike 8 or fewer jobs from either partition to run per node. So at mostonly 16 jobs should be allowed to share a given node.
Currently, we have
SelectType=select/cons_res
  SelectTypeParameters=CR_CPU
in our slurm.conf

The nodes are defined as:
NodeName=n[144-146] NodeAddr=10.50.50.[144-146] CPUs=56 Sockets=2CoresPerSocket=14 ThreadsPerCore=2 RealMemory=1500000<tel:1500000> State=UNKNOWN
The two partitions are defined as:
PartitionName=analysis Nodes=n[144-146] MaxTime=4-0:0 MaxNodes=1State=UP AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NODefault=NOPartitionName=bio Nodes=n[144-146] MaxTime=14-0:0 MaxNodes=1 State=UPAllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO
We discovered the hard way this means users can run 4 jobs per each ofthe 56 CPUs/threads on each node. Oops! Not what we intended.
All our other compute nodes are defined as exclusive, and we don'tallow multiple jobs to run on them.
Any recommendations how to implement the 8 jobs per partition per nodelimit we'd like? Should we switch our SelectTypeParameters toCR_Socket or CR_Socket_Memory, for example?
--
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes lofor...@alaska.edu <mailto:lofor...@alaska.edu> ph:907-450-8618 <tel:907-450-8618> fax: 907-450-8601 <tel:907-450-8601>
UAF Research Computing Systems Senior HPC Engineer        CISSP

Re: [slurm-users] Limit number of jobs on shared nodes?

Reply via email to