Guillaume,

Check out the slurm-users thread from 2018 "pam_slurm_adopt does not constrain memory?" which explains the issues with systemd-logind.

Also: https://bugs.schedmd.com/show_bug.cgi?id=5920

-b

On 2/9/23 7:09 AM, Guillaume Lechantre wrote:
Hi everyone,

I'm in charge of the new cluster of GPU in my lab.

I'm using cgroup to restrict access to ressources, especially GPUs.
It works fine when user use the connection created by slurm.

I am using the pam_slurm_adopt.so module to give ssh access to a node if the user already has a job running on it. When connecting to the node threw ssh, the user can see and use all the GPUs of the node, even if he asked for just one. This is really problematic as most user use the cluster by connecting their IDE with ssh to the cluster.

I can't find any related ressources on the internet and in the old mails, do you have any idea what I am missing?
I'm not an expert, and working in the system administration for 5 month...


Thanks in advance,

Guillaume

Reply via email to