Re: [slurm-users] [External] Munge thinks clocks aren't synced

2020-10-29 Thread Prentice Bisbal
Good catch. I didn't even notice that. I definitely think that is ntpd.conf file on the head node is restricting access by IP range. Prentice On 10/28/20 3:04 AM, Williams, Gareth (IM&T, Black Mountain) wrote: I’m pretty sure that ntp info indicates ntp is not working. reach=0 so no successf

Re: [slurm-users] [External] Munge thinks clocks aren't synced

2020-10-29 Thread Prentice Bisbal
Having the head node run as an NTP server is a good idea. I set up my clusters the same way. Is it possible that ntp.conf on the head node has a restrict statement that restricts access to it by IP address/range, which is why this one node on a different network can't reach it? It sounds like

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread William Brown
That is interesting as I run with SElinux enforcing. I will do some more testing of attaching by ssh to nodes with running jobs. William On Thu, 29 Oct 2020, 11:58 Paul Raines, wrote: > The debugging was useful. The problem turned out to be that I am running > with SELINUX enabled due to corp

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread Wensheng Deng
Interesting... On Thu, Oct 29, 2020 at 7:56 AM Paul Raines wrote: > The debugging was useful. The problem turned out to be that I am running > with SELINUX enabled due to corporate policy. The issue was SELINUX is > blocking sshd access to /var/slurm/spool/d socket files: > > time->Thu Oct 29

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-29 Thread Paul Raines
The debugging was useful. The problem turned out to be that I am running with SELINUX enabled due to corporate policy. The issue was SELINUX is blocking sshd access to /var/slurm/spool/d socket files: time->Thu Oct 29 07:53:50 2020 type=AVC msg=audit(1603972430.809:2800): avc: denied { write

[slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-29 Thread Zacarias Benta
Good morning everyone. I'm having a "issue", I don't know if it is a "bug or a feature". I've created a QOS:  "sacctmgr add qos myqos set GrpTRESMins=cpu=10 flags=NoDecay". I know the limit it too low, but I just wanted to give you guys an example. Whenever a user submits a job and uses this Q

[slurm-users] sbatch overallocation

2020-10-29 Thread Max Quast
Okay, thanks for the hint that I should use cgroups. With cgroups the behaviour is as expected. :) Max > Il 10/10/20 18:53, Renfro, Michael ha scritto: > > > * Do you want to ensure that one job requesting 9 tasks (and 1 CPU per > > task) can’t overstep its reservation and take