Nice find. Thanks for sharing back. On Tue, Dec 13, 2022 at 10:39 AM Paul Raines <rai...@nmr.mgh.harvard.edu> wrote:
> > Yes, looks like SLURM is using the apicid that is in /proc/cpuinfo > The first 14 cpus in /proc/cpu (procs 0-13) have apicid > 0,2,4,6,8,10,12,14,16,20,22,24,26,28 in /proc/cpuinfo > > So after setting CpuSpecList=0,2,4,6,8,10,12,14,16,18,20,22,24,26 > in slurm.conf it appears to be doing what I want > > $ echo $SLURM_JOB_ID > 9 > $ grep -i ^cpu /proc/self/status > Cpus_allowed: 000f0000,000f0000 > Cpus_allowed_list: 16-19,48-51 > $ scontrol -d show job 9 | grep CPU_ID > Nodes=larkin CPU_IDs=32-39 Mem=25600 GRES= > > apcid=32 is processor=16 and apcid=33 is processor=48 in /proc/cpuinfo > > Thanks > > -- Paul Raines (http://help.nmr.mgh.harvard.edu) > > > > On Tue, 13 Dec 2022 9:52am, Sean Maxwell wrote: > > > External Email - Use Caution > > > > In the slurm.conf manual they state the CpuSpecList ids are "abstract", > and > > in the CPU management docs they enforce the notion that the abstract > Slurm > > IDs are not related to the Linux hardware IDs, so that is probably the > > source of the behavior. I unfortunately don't have more information. > > > > On Tue, Dec 13, 2022 at 9:45 AM Paul Raines <rai...@nmr.mgh.harvard.edu> > > wrote: > > > >> > >> Hmm. Actually looks like confusion between CPU IDs on the system > >> and what SLURM thinks the IDs are > >> > >> # scontrol -d show job 8 > >> ... > >> Nodes=foobar CPU_IDs=14-21 Mem=25600 GRES= > >> ... > >> > >> # cat > >> /sys/fs/cgroup/system.slice/slurmstepd.scope/job_8/cpuset.cpus.effective > >> 7-10,39-42 > >> > >> > >> -- Paul Raines ( > http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu > ) > >> > >> > >> > >> On Tue, 13 Dec 2022 9:40am, Paul Raines wrote: > >> > >> > > >> > Oh but that does explain the CfgTRES=cpu=14. With the CpuSpecList > >> > below and SlurmdOffSpec I do get CfgTRES=cpu=50 so that makes sense. > >> > > >> > The issue remains that thought the number of cpus in CpuSpecList > >> > is taken into account, the exact IDs seem to be ignored. > >> > > >> > > >> > -- Paul Raines ( > http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu > ) > >> > > >> > > >> > > >> > On Tue, 13 Dec 2022 9:34am, Paul Raines wrote: > >> > > >> >> > >> >> I have tried it both ways with the same result. The assigned CPUs > >> >> will be both in and out of the range given to CpuSpecList > >> >> > >> >> I tried setting using commas instead of ranges so used > >> >> > >> >> CpuSpecList=0,1,2,3,4,5,6,7,8,9,10,11,12,13 > >> >> > >> >> But still does not work > >> >> > >> >> $ srun -p basic -N 1 --ntasks-per-node=1 --mem=25G \ > >> >> --time=10:00:00 --cpus-per-task=8 --pty /bin/bash > >> >> $ grep -i ^cpu /proc/self/status > >> >> Cpus_allowed: 00000780,00000780 > >> >> Cpus_allowed_list: 7-10,39-42 > >> >> > >> >> > >> >> -- Paul Raines ( > http://secure-web.cisco.com/1w33sdTB1gUzmmNOl1cd8t7VHLUOemWW6ExRIq2AHSLm0BwRxhnfCCHDdln0LWn7IZ3IUYdxeX2HzyDj7CeKHq7B1H5ek2tow-D_4Q81mK8_x_AKf6cHYOSqHSBelLikTijDZJGsJYKSleSUlZMG1mqkU4e4TirhUu0qTLKUcvqLxsvi1WCbBbyUaDUxd-c7kE2_v4XzvhBtdEqrkKAWOQF2WoJwhmTJlMhanBk-PdjHDsuDcdOgfHrmIAiRC-T8hB094Y1WvEuOjL4o2Kbx28qp4eUSPu8jSOxPEKoWsHpSDE7fWyjrlcVAsEyOpPgp4/http%3A%2F%2Fhelp.nmr.mgh.harvard.edu > ) > >> >> > >> >> > >> >> > >> >> On Mon, 12 Dec 2022 10:21am, Sean Maxwell wrote: > >> >> > >> >>> Hi Paul, > >> >>> > >> >>> Nodename=foobar \ > >> >>>> CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 > >> >>>> ThreadsPerCore=2 > >> >>>> \ > >> >>>> RealMemory=256312 MemSpecLimit=32768 CpuSpecList=14-63 \ > >> >>>> TmpDisk=6000000 Gres=gpu:nvidia_rtx_a6000:1 > >> >>>> > >> >>>> The slurm.conf also has: > >> >>>> > >> >>>> ProctrackType=proctrack/cgroup > >> >>>> TaskPlugin=task/affinity,task/cgroup > >> >>>> TaskPluginParam=Cores,*SlurmdOf**fSpec*,Verbose > >> >>>> > >> >>> > >> >>> Doesn't setting SlurmdOffSpec tell Slurmd that is should NOT use > the > >> >>> CPUs > >> >>> in the spec list? ( > >> >>> > https://secure-web.cisco.com/1V9Fskh4lCAx_XrdlCr8o1EtnePELf-1YK4TerT47ktLxy3fO9FaIpaGXVA8ODhMAdhmXSqToQstwAilA71r7z1Q4jDqPSKEsJQNUhJYYRtxFnZIO49QxsYrVo9c3ExH89cIk_t7H5dtGEjpme2LFKm23Z52yK-xZ3fEl_LyK61uCzkas6GKykzPCPyoNXaFgs32Ct2tDIVL8vI6JW1_-1uQ9gUyWmm24xJoBxLEui7tSTVwMtiVRu8C7pU1nJ8qr6ghBlxrqx-wQiRP4XBCjhPARHa2KBqkUogjEVRAe3WdAbbYBxtXeVsWjqNGmjSVA/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_SlurmdOffSpec > ) > >> >>> In this case, I believe it uses what is left, which is the 0-13. > We > >> are > >> >>> just starting to work on this ourselves, and were looking at this > >> >>> setting. > >> >>> > >> >>> Best, > >> >>> > >> >>> -Sean > >> >>> > >> >> > >> > > >> The information in this e-mail is intended only for the person to whom > it > >> is addressed. If you believe this e-mail was sent to you in error and > the > >> e-mail contains patient information, please contact the Mass General > >> Brigham Compliance HelpLine at > >> > https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline > < > >> > https://secure-web.cisco.com/11OmVChs0jRoe-4AH2iRxvEdMN0dxZcFsunG07PJ0sXxdW7tj7-BUiDwEEi3gkqOms_qFRdQbCLHJQW0jD6cG8-griFmte8mXIoPZSDzIE8dHcew9yMCpQxJnYVVs8mK5aB-9o4ospPlPqxo3FA0LN8gpJSrsBKOxr5m7T3Jd7FY04zJnehrYc0FQwfWAPJx523fZTqVTTmwZgdEFZAQtURZ8hPxlohSzsh7d13L7byOVUmxAxzolzDTvRSH9l1gjMm-RjtdW95eYkgPlRoM3nJ0WCledYAp5NA3kUGNhsc5uNDp3lWIzS7gZGIMfTyg9/https%3A%2F%2Fwww.massgeneralbrigham.org%2Fcomplianceline> > . > >> Please note that this e-mail is not secure (encrypted). If you do not > >> wish to continue communication over unencrypted e-mail, please notify > the > >> sender of this message immediately. Continuing to send or respond to > >> e-mail after receiving this message means you understand and accept this > >> risk and wish to continue to communicate over unencrypted e-mail. > >> > >> > The information in this e-mail is intended only for the person to whom it > is addressed. If you believe this e-mail was sent to you in error and the > e-mail contains patient information, please contact the Mass General > Brigham Compliance HelpLine at > https://www.massgeneralbrigham.org/complianceline < > https://www.massgeneralbrigham.org/complianceline> . > Please note that this e-mail is not secure (encrypted). If you do not > wish to continue communication over unencrypted e-mail, please notify the > sender of this message immediately. Continuing to send or respond to > e-mail after receiving this message means you understand and accept this > risk and wish to continue to communicate over unencrypted e-mail. > >