Hello
We tossed an A100 card set up as 7 MIG (multi instance GPU) devices in
slurm, but the devices to refer to each MIG are not immediately obvious. (16
/dev/nvidia-cap ### devices were created.)
Is anyone familiar with this and know how to set up MIG devices as cgroup
controlled TRES in s
Dear Kota,
Appreciate the feedback. I will read up the latest documentation when the time
comes to configure. Thank you for your detailed email and will indeed read your
blog.
Regards
Tim
From: slurm-users on behalf of Kota
Tsuyuzaki
Sent: Wednesday, 21 Ap
Hi Ewan,
Thank you for the response. Exactly the source of information I was looking
for. The 'slurm-mig-discovery' tool is perfect.
Cheers
Tim
From: slurm-users on behalf of Ewan
Roche
Sent: Wednesday, 21 April 2021 10:12
To: Slurm User Community List
Cc:
Dear Community,
Trust everyone is well and keeping safe?
We are considering the purchase of nodes with the Nvidia A100 GPUs and enabling
the MIG feature which allows for the creation of instance resource profiles.
The creation of these profiles seems to be straightforward as per the
documentat
Hello Tim,
In the last year, I figured out the A100 MIG feature behavior with Slurm
Workload Manager. At that time, it required non-default
DEVFS mode in kernel config to constraint the MIG device via Slurm cgroup.
After the setting, A100 MIG works well to me so I suppose
it should NOT be blocki
Hi Tim,
we have MIG configured and integrated with Slurm using the slurm-mig-discovery
tools:
https://gitlab.com/nvidia/hpc/slurm-mig-discovery
The mig-parted tool is great for setting up MIG itself:
https://github.com/NVIDIA/mig-parted
Once setup MIG instances work fine with Slurm although th