The debugging was useful.  The problem turned out to be that I am running
with SELINUX enabled due to corporate policy.  The issue was SELINUX is
blocking sshd access to /var/slurm/spool/d socket files:

time->Thu Oct 29 07:53:50 2020
type=AVC msg=audit(1603972430.809:2800): avc: denied { write } for pid=403840 comm="sshd" name="rtx-05_811.4294967295" dev="md122" ino=2228938 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_t:s0 tclass=sock_file permissive=1

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Mon, 26 Oct 2020 9:26am, Paul Raines wrote:


With debugging on I get:

Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: Reading slurm.conf file: /etc/slurm/slurm.conf Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 808, stepid = 4294967295 Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 808, stepid = 0 Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: _step_connect: connect() failed dir /var/slurm/spool/d node rtx-03 step 808.4294967295 Permission denied Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug3: unable to connect to step 808.4294967295 on rtx-03: Permission denied Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: send_user_msg: Access denied by pam_slurm_adopt: you have no active jobs on this node Oct 26 09:22:33 rtx-03 sshd[176647]: pam_access(sshd:account): access denied for user `raines' from `10.162.254.11' Oct 26 09:22:33 rtx-03 sshd[176647]: fatal: Access denied for user raines by PAM account configuration [preauth]


-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 23 Oct 2020 11:12pm, Wensheng Deng wrote:

 Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth,
 restart sshd, try a new job and ssh session. then check log message in
 /var/log/secure...


 On Fri, Oct 23, 2020 at 9:04 PM Paul Raines <rai...@nmr.mgh.harvard.edu>
 wrote:


 I am running Slurm 20.02.3 on CentOS 7 systems.  I have pam_slurm_adopt
 setup in /etc/pam.d/system-auth and slurm.conf has
 PrologFlags=Contain,X11
 I also have masked systemd-logind

 But pam_slurm_adopt always denies login with "Access denied by
 pam_slurm_adopt: you have no active jobs on this node" even when the
 user most definitely has a job running on the node via srun

 Any clues as to why pam_slurm_adopt thinks there is no job?

 serena [raines] squeue
               JOBID PARTITION     NAME     USER ST       TIME  NODES
 NODELIST(REASON)
                 785    lcnrtx     tcsh   raines  R   19:44:51      1
 rtx-03
 serena [raines] ssh rtx-03
 Access denied by pam_slurm_adopt: you have no active jobs on this node
 Authentication failed.




Reply via email to