Re: [slurm-users] srun and --cpus-per-task

Hermann Schwärzler Thu, 24 Mar 2022 07:57:49 -0700

Hi Durai,

I see the same thing as you on our test-cluster that has
ThreadsPerCore=2
configured in slurm.conf


The double-foo goes away with this:
srun --cpus-per-task=1 --hint=nomultithread echo foo

Having multithreading enabled leads to imho surprising behaviour ofSlurm. My impression is that using it makes the concept of "a CPU" inSlurm somewhat fuzzy. It becomes unclear and ambiguous what you get whenusing the cpu-related options of srun, sbatch and salloc: is it aCPU-core or is it a CPU-thread?


I think what you found is a bug.

If you run

for c in {4..1}
do
 echo "## $c ###"
 srun -c $c bash -c 'echo $SLURM_CPU_BIND_LIST'
done

you will get:

## 4 ###
0x003003
## 3 ###
0x003003
## 2 ###
0x001001
## 1 ###
0x000001,0x001000
0x000001,0x001000

You see: requesting 4 and 3 CPUs results in the same cpu-binding as bothneed two CPU-cores with 2 threads each. In the "3" case one of it staysunused but of course is not free for another job.In the "1" case I would expect to see the same binding as in the "2"case. If you combine the two values in the list you *do* get the samevalue but obviously it's a list of two values and this might be theorigin of the problem.

It is probably related to what's mentioned in the documentation for'--ntasks':"[...] The default is one task per node, but note that the--cpus-per-task option will change this default."


Regards
Hermann

On 3/24/22 1:37 PM, Durai Arasan wrote:

Hello Slurm users,
We are experiencing strange behavior with srun executing commands twiceonly when setting --cpus-per-task=1
$ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo
srun: job 1298286 queued and waiting for resources
srun: job 1298286 has been allocated resources
foo
foo

This is not seen when --cpus-per-task is another value:

$ srun --cpus-per-task=3 --partition=gpu-2080ti echo foo
srun: job 1298287 queued and waiting for resources
srun: job 1298287 has been allocated resources
foo

Also when specifying --ntasks:
$ srun -n1 --cpus-per-task=1 --partition=gpu-2080ti echo foo
srun: job 1298288 queued and waiting for resources
srun: job 1298288 has been allocated resources
foo

Relevant slurm.conf settings are:
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
# example node configuration
NodeName=slurm-bm-58 NodeAddr=xxx.xxx.xxx.xxx Procs=72 Sockets=2CoresPerSocket=18 ThreadsPerCore=2 RealMemory=354566Gres=gpu:rtx2080ti:8 Feature=xx_v2.38 State=UNKNOWN
On closer of job variables in the "--cpus-per-task=1" case, thefollowing variables have wrongly acquired a value of 2 for no reason:
SLURM_NTASKS=2
SLURM_NPROCS=2
SLURM_TASKS_PER_NODE=2
SLURM_STEP_NUM_TASKS=2
SLURM_STEP_TASKS_PER_NODE=2

Can you see what could be wrong?

Best,
Durai

Re: [slurm-users] srun and --cpus-per-task

Reply via email to