Re: [slurm-users] GRES Restrictions

Christoph Brüning Tue, 25 Aug 2020 08:26:08 -0700

Hello,

we're using cgroups to restrict access to the GPUs.

What I found particularly helpful, are the slides by Marshall Garey fromlast year's Slurm User Group Meeting:https://slurm.schedmd.com/SLUG19/cgroups_and_pam_slurm_adopt.pdf(NVML didn't work for us for some reason I cannot recall, but listingthe GPU device files explicitly was not a big deal)


Best,
Christoph


On 25/08/2020 16.12, Willy Markuske wrote:

Hello,
I'm trying to restrict access to gpu resources on a cluster I maintainfor a research group. There are two nodes put into a partition with gresgpu resources defined. User can access these resources by submittingtheir job under the gpu partition and defining a gres=gpu.
When a user includes the flag --gres=gpu:# they are allocated the numberof gpus and slurm properly allocates them. If a user requests only 1 gputhey only see CUDA_VISIBLE_DEVICES=1. However, if a user does notinclude the --gres=gpu:# flag they can still submit a job to thepartition and are then able to see all the GPUs. This has led to somebad actors running jobs on all GPUs that other users have allocated andcausing OOM errors on the gpus.
Is it possible, and where would I find the documentation on doing so, torequire users to define a --gres=gpu:# to be able to submit to apartition? So far reading the gres documentation doesn't seem to haveyielded any word on this issue specifically.
Regards,

--

Willy Markuske

HPC Systems Engineer

        

Research Data Services

P: (858) 246-5593


--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499

Re: [slurm-users] GRES Restrictions

Reply via email to