[slurm-users] A question about slurm nodes event time

2020-05-14 Thread hwj_0...@163.com
To whom it may concern, I have a question about the event and would like to ask for help. When a node has a job running, it performs the offline operation. The command of ‘sacctmgr show event ’cannot view the node offline record, which can only be seen after the job completed. Moreover,

Re: [slurm-users] Munge decode failing on new node

2020-05-14 Thread dean.w.schulze
This problem turned out to be that the new node was on a different subnet than the other nodes. Once our network admin opened up ports 6817, 6818, and 6188 between the subnets the new node worked. Thanks for all the responses. From: slurm-users On Behalf Of Riebs, Andy Sent: Friday, Ap

Re: [slurm-users] [External] Re: Node suspend / Power saving - for *idle* nodes only?

2020-05-14 Thread Florian Zillner
Well, the documentation is rather clear on this: "SuspendTime: Nodes becomes eligible for power saving mode after being idle or down for this number of seconds." A drained node is neither idle nor down in my mind. Thanks, Florian From: slurm-users on behalf of

Re: [slurm-users] QOS cutting off users before CPU limit is reached

2020-05-14 Thread Williams, Jenny Avis
Try suspending and resuming the users pending jobs to force a re-evaluation. If the user is not in the zone of jobs that is evaluated, ie if enough higher priority jobs have dropped in ahead then this job may not have been evaluated for scheduling since a point in time when the user was indeed p

Re: [slurm-users] Node suspend / Power saving - for *idle* nodes only?

2020-05-14 Thread Steffen Grunewald
On Thu, 2020-05-14 at 13:10:04 +, Florian Zillner wrote: > Hi, > > I'm experimenting with slurm's power saving feature and shutdown of "idle" > nodes works in general, also the power up works when "idle~" nodes are > requested. > So far so good, but slurm is also shutting down nodes that are

[slurm-users] Node suspend / Power saving - for *idle* nodes only?

2020-05-14 Thread Florian Zillner
Hi, I'm experimenting with slurm's power saving feature and shutdown of "idle" nodes works in general, also the power up works when "idle~" nodes are requested. So far so good, but slurm is also shutting down nodes that are not explicitly "idle". Previously I drained a node to debug something o