[slurm-users] A100 MIG in slurm

2021-04-21 Thread Ransom, Geoffrey M.
Hello We tossed an A100 card set up as 7 MIG (multi instance GPU) devices in slurm, but the devices to refer to each MIG are not immediately obvious. (16 /dev/nvidia-cap ### devices were created.) Is anyone familiar with this and know how to set up MIG devices as cgroup controlled TRES in s

Re: [slurm-users] SLURM A100

2021-04-21 Thread Timothy Carr
Dear Kota, Appreciate the feedback. I will read up the latest documentation when the time comes to configure. Thank you for your detailed email and will indeed read your blog. Regards Tim From: slurm-users on behalf of Kota Tsuyuzaki Sent: Wednesday, 21 Ap

Re: [slurm-users] {Suspected Spam?} SLURM A100

2021-04-21 Thread Timothy Carr
Hi Ewan, Thank you for the response. Exactly the source of information I was looking for. The 'slurm-mig-discovery' tool is perfect. Cheers Tim From: slurm-users on behalf of Ewan Roche Sent: Wednesday, 21 April 2021 10:12 To: Slurm User Community List Cc:

[slurm-users] SLURM A100

2021-04-21 Thread Timothy Carr
Dear Community, Trust everyone is well and keeping safe? We are considering the purchase of nodes with the Nvidia A100 GPUs and enabling the MIG feature which allows for the creation of instance resource profiles. The creation of these profiles seems to be straightforward as per the documentat

Re: [slurm-users] SLURM A100

2021-04-21 Thread Kota Tsuyuzaki
Hello Tim, In the last year, I figured out the A100 MIG feature behavior with Slurm Workload Manager. At that time, it required non-default DEVFS mode in kernel config to constraint the MIG device via Slurm cgroup. After the setting, A100 MIG works well to me so I suppose it should NOT be blocki

Re: [slurm-users] {Suspected Spam?} SLURM A100

2021-04-21 Thread Ewan Roche
Hi Tim, we have MIG configured and integrated with Slurm using the slurm-mig-discovery tools: https://gitlab.com/nvidia/hpc/slurm-mig-discovery The mig-parted tool is great for setting up MIG itself: https://github.com/NVIDIA/mig-parted Once setup MIG instances work fine with Slurm although th