The SLURM controller AND all the compute nodes need to know who all is in
the cluster. If you want to add a node or it changes IP addresses, you need
to let all the nodes know about this which, for me, usually means
restarting slurmd on the compute nodes.
I just say this because I get caught by th
understand how with
"shared=exclusive" srun gives one result and sbatch gives another.
Tim
On Wed, May 19, 2021 at 11:26 AM Tim Carlson
wrote:
> Hey folks,
>
> Here is my setup:
>
> slurm-20.11.4 on x86_64 running Centos 7.x with CUDA 11.1
>
> The relevant parts
Hey folks,
Here is my setup:
slurm-20.11.4 on x86_64 running Centos 7.x with CUDA 11.1
The relevant parts of the slurm.conf and a particular gres.conf file are:
SelectType=select/cons_res
SelectTypeParameters=CR_Core
PriorityType=priority/multifactor
GresTypes=gpu
NodeName=dlt[01-12] Gr