Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2021-01-15 Thread William Brown
I encountered the same problem, and as with munge I created a .te file that can be built to create a policy to add to the compute nodes to fix this: my-pam_slurm_adopt.te: --- module my-pam_slurm_adopt 1.0; require {

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2021-01-15 Thread Ole Holm Nielsen
On 10/29/20 12:56 PM, Paul Raines wrote: The debugging was useful.  The problem turned out to be that I am running with SELINUX enabled due to corporate policy.  The issue was SELINUX is blocking sshd access to /var/slurm/spool/d socket files: The documentation https://slurm.schedmd.com/pam_slu

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread William Brown
That is interesting as I run with SElinux enforcing. I will do some more testing of attaching by ssh to nodes with running jobs. William On Thu, 29 Oct 2020, 11:58 Paul Raines, wrote: > The debugging was useful. The problem turned out to be that I am running > with SELINUX enabled due to corp

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread Wensheng Deng
Interesting... On Thu, Oct 29, 2020 at 7:56 AM Paul Raines wrote: > The debugging was useful. The problem turned out to be that I am running > with SELINUX enabled due to corporate policy. The issue was SELINUX is > blocking sshd access to /var/slurm/spool/d socket files: > > time->Thu Oct 29

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread Paul Raines
The debugging was useful. The problem turned out to be that I am running with SELINUX enabled due to corporate policy. The issue was SELINUX is blocking sshd access to /var/slurm/spool/d socket files: time->Thu Oct 29 07:53:50 2020 type=AVC msg=audit(1603972430.809:2800): avc: denied { write

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-26 Thread Paul Raines
I have ConstrainRAMSpace=yes in cgroups.conf and PrologFlags=Contain,X11 in slurm.conf I just tried $ squeue JOBID PARTITION NAME USER ST TIME NODES 808lcnrtx tcsh raines R 1-22:39:17 1 rtx-03 $ srun --jobid 808 --pty /bin/tcsh ^Csrun:

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-26 Thread Paul Raines
With debugging on I get: Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: Reading slurm.conf file: /etc/slurm/slurm.conf Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 808, stepid = 4294967295 Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 8

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-24 Thread Juergen Salk
Hi Paul, maybe this is totally unrelated but we also have a similar issue with pam_slurm_adopt in case that ConstrainRAMSpace=no is set in cgroup.conf and more than one job is running on that node. There is a bug report open at: https://bugs.schedmd.com/show_bug.cgi?id=9355 As a workaround we

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Christopher Samuel
Hi Paul, On 10/23/20 10:13 am, Paul Raines wrote: Any clues as to why pam_slurm_adopt thinks there is no job? Do you have PrologFlags=Contain in your slurm.conf? Contain At job allocation time, use the ProcTrack plugin to create a job container on all allocated compute nodes. This co

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Wensheng Deng
Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth, restart sshd, try a new job and ssh session. then check log message in /var/log/secure... On Fri, Oct 23, 2020 at 9:04 PM Paul Raines wrote: > > I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt > setup

[slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Paul Raines
I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt setup in /etc/pam.d/system-auth and slurm.conf has PrologFlags=Contain,X11 I also have masked systemd-logind But pam_slurm_adopt always denies login with "Access denied by pam_slurm_adopt: you have no active jobs on this n