Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Christopher Samuel
Hi Paul, On 10/23/20 10:13 am, Paul Raines wrote: Any clues as to why pam_slurm_adopt thinks there is no job? Do you have PrologFlags=Contain in your slurm.conf? Contain At job allocation time, use the ProcTrack plugin to create a job container on all allocated compute nodes. This co

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Wensheng Deng
Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth, restart sshd, try a new job and ssh session. then check log message in /var/log/secure... On Fri, Oct 23, 2020 at 9:04 PM Paul Raines wrote: > > I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt > setup

[slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Paul Raines
I am running Slurm 20.02.3 on CentOS 7 systems. I have pam_slurm_adopt setup in /etc/pam.d/system-auth and slurm.conf has PrologFlags=Contain,X11 I also have masked systemd-logind But pam_slurm_adopt always denies login with "Access denied by pam_slurm_adopt: you have no active jobs on this n

[slurm-users] Help decoding step ID in slurmd log

2020-10-23 Thread Sebastian T Smith
Hi, I'm performing diagnostics on an application that isn't terminating correctly. While reviewing slurmd logs I found a couple of lines I need help decoding (logs are normal): Line 45: [2020-10-23T14:30:22.610] [2547451.batch] Sent signal 18 to 2547451.4294967294 Line 46: [2020-10-23T14:30:2

Re: [slurm-users] Slurm not enforcing gres requests at submission

2020-10-23 Thread Stephan Schott
Well, I might be wrong, but afaik, the SelectType is just telling your cluster which resources are going to be consumables (in this case, trackable resources, which is not only gpus). If you want to specify which conditions have to be met when using the cluster or a determined partition, then you h

Re: [slurm-users] Array jobs vs Fairshare

2020-10-23 Thread Stephan Schott
Apparently there is not much out there regarding this. To me it seems that once an array job has started, they run without checking if the Fairshare factor of the user actually would allow for another step to start. That would be far from ideal, as it just opens the door for malicious usage of the

Re: [slurm-users] Partition QOS limit not being enforced

2020-10-23 Thread Matthew Brown
Yes, I think you need AccountingStorageEnforce to have at least "limits" set. See the AccountingStorageEnforce section here: https://slurm.schedmd.com/accounting.html We use Partition QOS like you described and our "scontrol show config" shows "AccountingStorageEnforce = associations,limits,qos,s