Hmm. Actually looks like confusion between CPU IDs on the system
and what SLURM thinks the IDs are
# scontrol -d show job 8
...
Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES=
...
# cat
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective
7-10,39-42
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Tue, 13 Dec 2022 9:40am, Paul Raines wrote:
Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList
below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense.
The issue remains that thought the number of cpus in CpuSpecList
is taken into account, the exact IDs seem to be ignored.
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Tue, 13 Dec 2022 9:34am, Paul Raines wrote:
I have tried it both ways with the same result. The assigned CPUs
will be both in and out of the range given to CpuSpecList
I tried setting using commas instead of ranges so used
CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13
But still does not work
$ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \
--time=10:00:00 --cpus-per-task=8 --pty /bin/bash
$ grep -i ^cpu /proc/self/status
Cpus_allowed: 00000780,00000780
Cpus_allowed_list: 7-10,39-42
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote:
Hi Paul,
Nodename=foobar \
CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
ThreadsPerCore=2
\
RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \
TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1
The slurm.conf also has:
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose
Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use the
CPUs
in the spec list? (
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdOffSpec)
In this case, I believe it uses what is left, which is the 0-13. We are
just starting to work on this ourselves, and were looking at this
setting.
Best,
-Sean
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Mass General Brigham Compliance
HelpLine at https://www.massgeneralbrigham.org/complianceline
<https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.