One of our researchers asked whether it was possible to require a
job to use NVLink-ed pairs of GPUs.
I see that there is a support ticket on the SchedMD site which
covers this (https://support.schedmd.com/show_bug.cgi?id=15995). That
ticket is a few years old though. Does anyone happen to know whether
support for this has been added in newer releases of SLURM?
The cluster in question does use "AutoDetect=nvml" in its gres.conf
and the output of "slurmd -G" shows that SLURM is aware of the NVLink
pairs. I assume the scheduler is trying to use that information. What I
want to know is whether there is some way for an end-user to add a
constraint (for example) to a job such that it only runs on an NVLink-ed
pair of GPUs.
I do know that there are other ways to implement this such as
requiring jobs to run with even numbers of GPUs, perhaps just on some nodes
to allow single GPU jobs to run on the remaining nodes. I'm specifically
asking about a flag or setting a user could apply to their jobs. If there
is such a thing maybe someone here knows about it. If so I'd love to hear
about it. Thanks!
--
*Mr. Marcus Lauer*
Systems Administrator
Penn Engineering
University of Pennsylvania
https://www.seas.upenn.edu/
--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]