Re: [slurm-users] node health check

2023-01-31 Thread Brian Johanson
On 1/30/23 10:35 PM, Ratnasamy, Fritz wrote: Hi,  Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few  days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it sa

Re: [slurm-users] node health check

2023-01-30 Thread Ole Holm Nielsen
On 1/31/23 04:35, Ratnasamy, Fritz wrote:  Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few  days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it says 202301

[slurm-users] node health check

2023-01-30 Thread Ratnasamy, Fritz
Hi, Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it says 20230130 21:25:14 [slurm] /usr/libexec/nhc/node-