Guillaume,
Check out the slurm-users thread from 2018 "pam_slurm_adopt does not
constrain memory?" which explains the issues with systemd-logind.
Also: https://bugs.schedmd.com/show_bug.cgi?id=5920
-b
On 2/9/23 7:09 AM, Guillaume Lechantre wrote:
Hi everyone,
I'm in charge of the new cluster of GPU in my lab.
I'm using cgroup to restrict access to ressources, especially GPUs.
It works fine when user use the connection created by slurm.
I am using the pam_slurm_adopt.so module to give ssh access to a node
if the user already has a job running on it.
When connecting to the node threw ssh, the user can see and use all
the GPUs of the node, even if he asked for just one.
This is really problematic as most user use the cluster by connecting
their IDE with ssh to the cluster.
I can't find any related ressources on the internet and in the old
mails, do you have any idea what I am missing?
I'm not an expert, and working in the system administration for 5 month...
Thanks in advance,
Guillaume