Any idea why pam_slurm_adopt would work on some nodes but not others? Here is 
an excerpt from one of the nodes:

Jan 28 15:38:54 dgx1-1 sshd[1027640]: pam_sss(sshd:auth): authentication 
success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.10.10.1 user=test.user
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: debug2: 
_establish_config_source: using 
config_file=/admin/slurm/slurm-21.08.5/etc/slurm.conf (default)
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: debug:  slurm_conf_init: using 
config_file=/admin/slurm/slurm-21.08.5/etc/slurm.conf
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: debug:  Reading slurm.conf 
file: /admin/slurm/slurm-21.08.5/etc/slurm.conf
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: debug:  Reading cgroup.conf 
file /admin/slurm/slurm-21.08.5/etc/cgroup.conf
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: debug4: found StepId=182409.0
Jan 28 15:38:54 dgx1-1 pam_slurm_adopt[1027640]: send_user_msg: Access denied 
by pam_slurm_adopt: you have no active jobs on this node
Jan 28 15:38:54 dgx1-1 sshd[1027640]: pam_access(sshd:account): access denied 
for user `test.user' from `10.10.10.1'

Squeue:
182409      v100     bash test.user  R    1:43:58      1 dgx1-1

Other nodes using the exact same config seem to work just fine. The debug 
doesn’t show much information. Could this be related to cgroups/adoption? Where 
could I get more information? The only difference I can think of is the nodes 
that are working seem to be built more recently than the others, but are 
patched to the same levels and get the same config.

Reply via email to