Hi Chris,

thanks for following up on this thread.

First of all, you will want to use cgroups to ensure that processes that do
not request GPUs cannot access them.


We had a feeling that cgroups might be more optimal. Could you point us to documentation that suggests cgroups to be a requirement?

Secondly, do your CPUs have hyperthreading enabled by some chance?
If so then your gres.conf is likely wrong as you'll want to list the first HT
on each core that you want to restrict access to.

No HT involved here at any point, neither on our cluster nor within the dockerized slurm installation I was playing with.

 From the manual page for gres.conf:

               NOTE: If your cores contain multiple threads only list the  
first  thread
               of  each  core.  The  logic  is  such that it uses core instead 
of thread
               scheduling per GRES. Also note that since Slurm must be able  to 
 perform
               resource management on heterogeneous clusters having various 
core ID num-
               bering schemes, an abstract index will be used instead  of  the  
physical
               core  index.  That  abstract  id may not correspond to your 
physical core
               number.  Basically Slurm starts numbering from 0 to n, being 0 
the id  of
               the  first processing unit (core or thread if HT is enabled) on 
the first
               socket, first core and maybe first thread, and  then  continuing 
 sequen-
               tially  to  the  next  thread,  core, and socket. The numbering 
generally
               coincides with the processing unit logical number (PU L#) seen 
in  lstopo
               output.


We are aware of this section of the manpage. thanks.

Best,
Peter

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to