That is interesting as I run with SElinux enforcing. I will do some more testing of attaching by ssh to nodes with running jobs.
William On Thu, 29 Oct 2020, 11:58 Paul Raines, <rai...@nmr.mgh.harvard.edu> wrote: > The debugging was useful. The problem turned out to be that I am running > with SELINUX enabled due to corporate policy. The issue was SELINUX is > blocking sshd access to /var/slurm/spool/d socket files: > > time->Thu Oct 29 07:53:50 2020 > type=AVC msg=audit(1603972430.809:2800): avc: denied { write } for > pid=403840 comm="sshd" name="rtx-05_811.4294967295" dev="md122" > ino=2228938 > scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 > tcontext=system_u:object_r:var_t:s0 tclass=sock_file permissive=1 > > -- Paul Raines (http://help.nmr.mgh.harvard.edu) > > > > On Mon, 26 Oct 2020 9:26am, Paul Raines wrote: > > > > > With debugging on I get: > > > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: Reading > slurm.conf > > file: /etc/slurm/slurm.conf > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = > 808, > > stepid = 4294967295 > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = > 808, > > stepid = 0 > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: _step_connect: > > connect() failed dir /var/slurm/spool/d node rtx-03 step 808.4294967295 > > Permission denied > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug3: unable to > connect to > > step 808.4294967295 on rtx-03: Permission denied > > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: send_user_msg: Access > denied > > by pam_slurm_adopt: you have no active jobs on this node > > Oct 26 09:22:33 rtx-03 sshd[176647]: pam_access(sshd:account): access > denied > > for user `raines' from `10.162.254.11' > > Oct 26 09:22:33 rtx-03 sshd[176647]: fatal: Access denied for user > raines by > > PAM account configuration [preauth] > > > > > > -- Paul Raines (http://help.nmr.mgh.harvard.edu) > > > > > > > > On Fri, 23 Oct 2020 11:12pm, Wensheng Deng wrote: > > > >> Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth, > >> restart sshd, try a new job and ssh session. then check log message in > >> /var/log/secure... > >> > >> > >> On Fri, Oct 23, 2020 at 9:04 PM Paul Raines < > rai...@nmr.mgh.harvard.edu> > >> wrote: > >> > >>> > >>> I am running Slurm 20.02.3 on CentOS 7 systems. I have > pam_slurm_adopt > >>> setup in /etc/pam.d/system-auth and slurm.conf has > >>> PrologFlags=Contain,X11 > >>> I also have masked systemd-logind > >>> > >>> But pam_slurm_adopt always denies login with "Access denied by > >>> pam_slurm_adopt: you have no active jobs on this node" even when the > >>> user most definitely has a job running on the node via srun > >>> > >>> Any clues as to why pam_slurm_adopt thinks there is no job? > >>> > >>> serena [raines] squeue > >>> JOBID PARTITION NAME USER ST TIME NODES > >>> NODELIST(REASON) > >>> 785 lcnrtx tcsh raines R 19:44:51 1 > >>> rtx-03 > >>> serena [raines] ssh rtx-03 > >>> Access denied by pam_slurm_adopt: you have no active jobs on this node > >>> Authentication failed. > >>> > >>> > >>> > >> > >