Hi Mathias

I can't answer your specific question, so this is more of a comment 🙂

We have a system with 8 x Nvidia A40, where we would like to share each GPU 
between several jobs (they have 48GB each), eg starting 32 jobs, with 4 on each 
GPU. I looked into MIG as well, but unfortunately that is not supported by the 
A40 hardware (only A30 and A100).
I have tried MPS, but strangely that works only for the first GPU on each node, 
so only one of the 8 GPUs in our system can be shared in this way. That is, it 
used to be like that. A couple of weeks ago, an "all_sharing" flag was 
introduced for gres.conf, which apparently should make it possible to share all 
the GPUs with MPS. I haven't tried it yet, but it may be worth a try. It should 
be possible to configure some GPUs as mps and some as gpu resources.

Cheers,

Esben


https://github.com/SchedMD/slurm/blob/master/doc/man/man5/gres.conf.5

all_sharing
To be used on a shared gres. This is the opposite of one_sharing and can be
used to allow all sharing gres (gpu) on a node to be used for shared gres (mps).
NOTE: If a gres has this flag configured it is global, so all other nodes with
that gres will have this flag implied.  This flag is not combatible with
one_sharing for a specific gres.



________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Matthias 
Leopold <matthias.leop...@meduniwien.ac.at>
Sent: Thursday, January 27, 2022 16:27
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] addressing NVIDIA MIG + non MIG devices in Slurm

Hi,

we have 2 DGX A100 systems which we would like to use with Slurm. We
want to use the MIG feature for _some_ of the GPUs. As I somehow
suspected I couldn't find a working setup for this in Slurm yet. I'll
describe the configuration variants I tried after creating the MIG
instances, it might be a longer read, please bear with me.

1. using slurm-mig-discovery for gres.conf
(https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fnvidia%2Fhpc%2Fslurm-mig-discovery&amp;data=04%7C01%7Cepf%40novozymes.com%7Ca0989487f14947bd269808d9e1a99cc4%7C43d5f49ee03a4d22a2285684196bb001%7C0%7C0%7C637788940862367005%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=AHN8h8%2FcB4xeC7MFdYgZRG7L62PTiz4OvampC1vWL5Q%3D&amp;reserved=0)
- CUDA_VISIBLE_DEVICES: list of indices
-> seems to bring a working setup and full flexibility at first, but
when taking a closer look the selection of GPU devices is completely
unpredictable (output of nvidia-smi inside Slurm job)

2. using "AutoDetect=nvml" in gres.conf (Slurm docs)
- CUDA_VISIBLE_DEVICES: MIG format (see
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Fcuda-c-programming-guide%2Findex.html%23env-vars&amp;data=04%7C01%7Cepf%40novozymes.com%7Ca0989487f14947bd269808d9e1a99cc4%7C43d5f49ee03a4d22a2285684196bb001%7C0%7C0%7C637788940862367005%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ZFXeCQh3%2FLO52zpiOqn7CROjq7wvj8zWRE8mj87P%2Bew%3D&amp;reserved=0)

2.1 converting ALL GPUs to MIG
- also a full A100 is converted to a 7g.40gb MIG instance
- gres.conf: "AutoDetect=nvml" only
- slurm.conf Node Def: naming all MIG types (read from slurmd debug log)
-> working setup
-> problem: IPC (MPI) between MIG instances not possible, this seems to
be a by-design limitation

2.2 converting SOME GPUs to MIG
- some A100 are NOT in MIG mode

2.2.1 using "AutoDetect=nvml" only (Variant 1)
- slurm.conf Node Def: Gres with and without type
-> problem: fatal: _foreach_slurm_conf: Some gpu GRES in slurm.conf have
a type while others do not (slurm_gres->gres_cnt_config (26) > tmp_count
(21))

2.2.2 using "AutoDetect=nvml" only (Variant 2)
- slurm.conf Node Def: only Gres without type (sum of MIG + non MIG)
-> problem: different GPU types can't be requested

2.2.3 using partial "AutoDetect=nvml"
- gres.conf: "AutoDetect=nvml" + hardcoding of non MIG GPUs
- slurm.conf Node Def: MIG + non MIG Gres types
-> produces a "perfect" config according to slurmd debug log
-> problem: the sanity-check mode of "AutoDetect=nvml" prevents
operation (?)
-> Reason=gres/gpu:1g.5gb count too low (0 < 21) [slurm@2022-01-27T11:23:59]

2.2.4 using static gres.conf with NVML generated config
- using a gres.conf with NVML generated config where I can define the
type for non MIG GPU and also set the UniqueId for MIG instances would
be the perfect solution
- slurm.conf Node Def: MIG + non MIG Gres types
-> problem: it doesn't work
-> Parsing error at unrecognized key: UniqueId

Thanks for reading this far. Am I missing something? How can MIG and non
MIG devices be addressed in a cluster? This setup of having MIG and non
MIG devices can't be exotic, since having ONLY MIG devices has severe
disadvantages (see 2.1). Thanks again for any advice.

Matthias

Reply via email to