Hi, I am currently encountering an issue with Slurm's GPU resource limitation.
I have attempted to restrict the number of GPUs a user can utilize by executing
the following command:
sacctmgr modify user lyz set MaxTRES=gres/gpu=2
This command is intended to limit user 'lyz' to using a maximum of
Hi ! Christ.
The cgroup.conf on my gpu node is as same as head node. The content are as
follow:
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes
I'll try slurm of high version.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an
Hi Chris!
I didn't modify the cgroup configuration file; I only upgraded the Slurm
version.
After that, the limitations worked successfully.
It's quite odd.
lyz
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Hi, Christopher. Thank you for your reply.
I have already modified the cgroup.conf configuration file in Slurm as follows:
vim /etc/slurm/cgroup.conf
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
Hi, Sean. It's the latest slurm version.
[root@head1 ~]# sinfo --version
slurm 22.05.3
And this is my content of the gres.conf in gpu node.
# This section of this file was automatically generated by cmd. Do not edit
manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
Name=gpu File=/dev/nvidi
Hi, Christ. Thank you for continuing paying attention to this issue.
I followed your instuction. And This is the output:
[root@head1 ~]# systemctl cat slurmd | fgrep Delegate
Delegate=yes
lyz
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-user
Hi! Christ.
Thank you again for your instruction.
I've tried version 23.11.10. It does work.
When I ran the script using the following command, it successfully restricted
the usage to the specified CUDA devices:
srun -p gpu --gres=gpu:2 -nodelist=node11 python test.py
And when I checked the GPU
Hi, Sean.
I followed your instructions and added ConstrainDevices=yes to the
/etc/slurm/cgroup.conf file on the server node, and then restarted the relevant
services on both the server and the client.
However, I still can't enforce the restriction in the Python program.
It seems like the restric