Re: [slurm-users] Best method to determine if a node is down

2021-06-27 Thread Marcus Boden
Hi Doug, Slurm has the strigger[1] mechanism that can do exactly that, the manpage even has your use case as an example. It works quite well for us. Best, Marcus [1] https://slurm.schedmd.com/strigger.html On 26.06.21 19:10, Doug Niven wrote: Hi Folks, I’d like to setup an email notificati

[slurm-users] Best method to determine if a node is down

2021-06-26 Thread Doug Niven
Hi Folks, I’d like to setup an email notification, perhaps via cron (unless there’s a better method) of notifying the sysadmin when a Slurm node is down and/or not firing off jobs... For example, using ‘squeue’ in NODELIST(REASON) I recently saw: (Nodes required for job are DOWN, DRAINED or re