Hi,

At our site we have recently upgraded to Slurm 23.11.5 and are having trouble 
with MPI jobs doing srun inside a sbatch'ed script.

The cgroup does not appear to be setup correctly for the srun (step_0).

As an example
$ cat /sys/fs/cgroup/cpuset/slurm/uid_11000..../job..../cpuset.cpus
0,2-3,68-69,96,98-99,164-165
$ cat /sys/fs/cgroup/cpuset/slurm/uid_11000..../job..../step_0/cpuset.cpus
0,2,68,96,98,164

The sbatch is allocated a range of cpus in the cgroup. However, when step_0 is 
run, only some of those CPUs are in the group.
I have noticed that it is always the range which is missing, ie 2-5 only 2 is 
included, 3,4,5 are missing.
This also only happens if there are multiple groups of cpus in the allocations. 
ie only 1-12 would be fine, however 1-12,15-20 would result in 1,15 only.

The sbatch also seems fine, with step_batch and step_extern being allocated 
correctly.

This causes numerous issues with MPI jobs as they end up overloading cpus.


We are running our nodes with threading enabled on the CPUs, and with cgroups 
and affinity plugins.

I have attached our slurm.conf to show our settings.

Our /etc/slurm/cgroup.conf is
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes


We have turned on logging at debug2 level, but I haven't yet found anything 
useful. Happy for a suggestion on what to look for.


Is anyone able to provide any advice on where to go next to try and identify 
the issue?

Regards,
Ashley Wright

Attachment: slurm.conf
Description: slurm.conf

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to