I’m fairly sure that you set this up the same way you set up for a peer-to-peer 
setup. Here’s ours:

[root@cuda001 ~]# nvidia-smi topo --matrix
        GPU0    GPU1    GPU2    GPU3    mlx4_0  CPU Affinity
GPU0     X      PIX     SYS     SYS     PHB     0-11
GPU1    PIX      X      SYS     SYS     PHB     0-11
GPU2    SYS     SYS      X      PIX     SYS     12-23
GPU3    SYS     SYS     PIX      X      SYS     12-23
mlx4_0  PHB     PHB     SYS     SYS      X 

[root@cuda001 ~]# cat /etc/slurm/gres.conf 

…

# 2 x K80 (perceval)
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[0-1] CPUs=0-11
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23

This also seems to be related:

https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Sep 10, 2020, at 11:00 AM, David Baker <d.j.ba...@soton.ac.uk> wrote:
> 
> Hello,
> 
> We are installing a group of nodes which all contain 4 GPU cards. The GPUs 
> are paired together using NVLINK as described in the matrix below. 
> 
> We are familiar with using Slurm to schedule and run jobs on GPU cards, but 
> this is the first time we have dealt with NVLINK enabled GPUs. Could someone 
> please advise us how to configure Slurm so that we can submit jobs to the 
> cards and make use of the NVLINK? That is, what do we need to put in the 
> gres.conf or slurm.conf, and how should users use the sbatch command? I 
> presume, for example, that a user could make use of a GPU card, and 
> potentially make use of memory on the paired card.
> 
> Best regards,
> David
> 
> [root@alpha51 ~]# nvidia-smi topo --matrix
>         GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity
> GPU0     X      NV2     SYS     SYS     0,2,4,6,8,10    0
> GPU1    NV2      X      SYS     SYS     0,2,4,6,8,10    0
> GPU2    SYS     SYS      X      NV2     1,3,5,7,9,11    1
> GPU3    SYS     SYS     NV2      X      1,3,5,7,9,11    1

Reply via email to