Re: [slurm-users] Running multi jobs on one CPU in parallel

Emre Brookes Tue, 14 Sep 2021 12:48:41 -0700

*-O*, *--overcommit*
   Overcommit resources. When applied to job allocation, only one CPU
   is allocated to the job per node and options used to specify the
   number of tasks per node, socket, core, etc. are ignored. When
   applied to job step allocations (the *srun* command when executed
   within an existing job allocation), this option can be used to
   launch more than one task per CPU. Normally, *srun* will not
   allocate more than one process per CPU. By specifying *--overcommit*
   you are explicitly allowing more than one process per CPU. However
   no more than *MAX_TASKS_PER_NODE* tasks are permitted to execute per
   node. NOTE: *MAX_TASKS_PER_NODE* is defined in the file /slurm.h/

and is not a variable, it is set at Slurm build time.


I have used this successfully to run more jobs than cpus/cores avail.

-e.



Karl Lovink wrote:

Hello,

I am in the process of setting up our SLURM environment. We want to use
SLURM during our DDoS exercises for dispatching DDoS attack scripts. We
need a lot of parallel running jobs on a total of 3 nodes.I can't get it
to run more than 128 jobs simultaneously. There are 128 cpu's in the
compute nodes.

How can I ensure that I can run more jobs in parallel than there are
CPUs in the compute node?

Thanks
Karl


My srun script is:
srun --exclusive --nodes 3 --ntasks 384 /ddos/demo/showproc.sh

And my slurm.conf file:
ClusterName=ddos-cluster
ControlMachine=slurm
SlurmUser=ddos
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/opt/slurm/spool/ctld
SlurmdSpoolDir=/opt/slurm/spool/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/opt/slurm/run/.pid
SlurmdPidFile=/opt/slurm/run/slurmd.pid
ProctrackType=proctrack/pgid
PluginDir=/opt/slurm/lib/slurm
ReturnToService=2
TaskPlugin=task/none
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill

SelectType=select/cons_tres
SelectTypeParameters=CR_Core

SlurmctldDebug=3
SlurmctldLogFile=/opt/slurm/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/opt/slurm/log/slurmd.log
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/none
AccountingStorageTRES=gres/gpu
DebugFlags=CPU_Bind,gres
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageUser=slurm
SlurmctldParameters=enable_configurable
GresTypes=gpu
DefMemPerNode=256000
NodeName=aivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
NodeName=mivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
NodeName=fiod CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
PartitionName=ddos Nodes=ALL Default=YES MaxTime=INFINITE State=UP
PartitionName=adhoc Nodes=ALL Default=YES MaxTime=INFINITE State=UP

.

Re: [slurm-users] Running multi jobs on one CPU in parallel

Reply via email to