Hi Ryan,

Thank you very much for your reply. That is useful. We'll see how we get on.

Best regards,
David
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Ryan 
Novosielski <novos...@rutgers.edu>
Sent: 11 September 2020 00:08
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Slurm -- using GPU cards with NVLINK

I’m fairly sure that you set this up the same way you set up for a peer-to-peer 
setup. Here’s ours:

[root@cuda001 ~]# nvidia-smi topo --matrix
        GPU0    GPU1    GPU2    GPU3    mlx4_0  CPU Affinity
GPU0     X      PIX     SYS     SYS     PHB     0-11
GPU1    PIX      X      SYS     SYS     PHB     0-11
GPU2    SYS     SYS      X      PIX     SYS     12-23
GPU3    SYS     SYS     PIX      X      SYS     12-23
mlx4_0  PHB     PHB     SYS     SYS      X

[root@cuda001 ~]# cat /etc/slurm/gres.conf

…

# 2 x K80 (perceval)
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[0-1] CPUs=0-11
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23

This also seems to be related:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2FSLUG19%2FGPU_Scheduling_and_Cons_Tres.pdf&amp;data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C1a052163da5d4d0643d808d855ded053%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=lV2AExQxAc7svAT2FNJHJ8TsU5pfix0GwjpQ29Cc%2B0A%3D&amp;reserved=0

--
____
|| \\UTGERS,      |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Sep 10, 2020, at 11:00 AM, David Baker <d.j.ba...@soton.ac.uk> wrote:
>
> Hello,
>
> We are installing a group of nodes which all contain 4 GPU cards. The GPUs 
> are paired together using NVLINK as described in the matrix below.
>
> We are familiar with using Slurm to schedule and run jobs on GPU cards, but 
> this is the first time we have dealt with NVLINK enabled GPUs. Could someone 
> please advise us how to configure Slurm so that we can submit jobs to the 
> cards and make use of the NVLINK? That is, what do we need to put in the 
> gres.conf or slurm.conf, and how should users use the sbatch command? I 
> presume, for example, that a user could make use of a GPU card, and 
> potentially make use of memory on the paired card.
>
> Best regards,
> David
>
> [root@alpha51 ~]# nvidia-smi topo --matrix
>         GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity
> GPU0     X      NV2     SYS     SYS     0,2,4,6,8,10    0
> GPU1    NV2      X      SYS     SYS     0,2,4,6,8,10    0
> GPU2    SYS     SYS      X      NV2     1,3,5,7,9,11    1
> GPU3    SYS     SYS     NV2      X      1,3,5,7,9,11    1

Reply via email to