I encountered the same problem, and as with munge I created a .te file that
can be built to create a policy to add to the compute nodes to fix this:
my-pam_slurm_adopt.te:
---
module my-pam_slurm_adopt 1.0;
require {
On 10/29/20 12:56 PM, Paul Raines wrote:
The debugging was useful. The problem turned out to be that I am running
with SELINUX enabled due to corporate policy. The issue was SELINUX is
blocking sshd access to /var/slurm/spool/d socket files:
The documentation https://slurm.schedmd.com/pam_slu
That is interesting as I run with SElinux enforcing.
I will do some more testing of attaching by ssh to nodes with running jobs.
William
On Thu, 29 Oct 2020, 11:58 Paul Raines, wrote:
> The debugging was useful. The problem turned out to be that I am running
> with SELINUX enabled due to corp
Interesting...
On Thu, Oct 29, 2020 at 7:56 AM Paul Raines
wrote:
> The debugging was useful. The problem turned out to be that I am running
> with SELINUX enabled due to corporate policy. The issue was SELINUX is
> blocking sshd access to /var/slurm/spool/d socket files:
>
> time->Thu Oct 29
The debugging was useful. The problem turned out to be that I am running
with SELINUX enabled due to corporate policy. The issue was SELINUX is
blocking sshd access to /var/slurm/spool/d socket files:
time->Thu Oct 29 07:53:50 2020
type=AVC msg=audit(1603972430.809:2800): avc: denied { write
I have ConstrainRAMSpace=yes in cgroups.conf and PrologFlags=Contain,X11
in slurm.conf
I just tried
$ squeue
JOBID PARTITION NAME USER ST TIME NODES
808lcnrtx tcsh raines R 1-22:39:17 1 rtx-03
$ srun --jobid 808 --pty /bin/tcsh
^Csrun:
With debugging on I get:
Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: Reading slurm.conf
file: /etc/slurm/slurm.conf
Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 808,
stepid = 4294967295
Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid = 8
Hi Paul,
maybe this is totally unrelated but we also have a similar issue with
pam_slurm_adopt in case that ConstrainRAMSpace=no is set in
cgroup.conf and more than one job is running on that node. There is a
bug report open at:
https://bugs.schedmd.com/show_bug.cgi?id=9355
As a workaround we
Hi Paul,
On 10/23/20 10:13 am, Paul Raines wrote:
Any clues as to why pam_slurm_adopt thinks there is no job?
Do you have PrologFlags=Contain in your slurm.conf?
Contain
At job allocation time, use the ProcTrack plugin to create a job
container on all allocated compute nodes. This co
Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth,
restart sshd, try a new job and ssh session. then check log message in
/var/log/secure...
On Fri, Oct 23, 2020 at 9:04 PM Paul Raines
wrote:
>
> I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt
> setup
I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt
setup in /etc/pam.d/system-auth and slurm.conf has PrologFlags=Contain,X11
I also have masked systemd-logind
But pam_slurm_adopt always denies login with "Access denied by
pam_slurm_adopt: you have no active jobs on this n
11 matches
Mail list logo