Hello,

We are installing a group of nodes which all contain 4 GPU cards. The GPUs are 
paired together using NVLINK as described in the matrix below.

We are familiar with using Slurm to schedule and run jobs on GPU cards, but 
this is the first time we have dealt with NVLINK enabled GPUs. Could someone 
please advise us how to configure Slurm so that we can submit jobs to the cards 
and make use of the NVLINK? That is, what do we need to put in the gres.conf or 
slurm.conf, and how should users use the sbatch command? I presume, for 
example, that a user could make use of a GPU card, and potentially make use of 
memory on the paired card.

Best regards,
David

[root@alpha51 ~]# nvidia-smi topo --matrix
        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity
GPU0     X      NV2     SYS     SYS     0,2,4,6,8,10    0
GPU1    NV2      X      SYS     SYS     0,2,4,6,8,10    0
GPU2    SYS     SYS      X      NV2     1,3,5,7,9,11    1
GPU3    SYS     SYS     NV2      X      1,3,5,7,9,11    1

Reply via email to