David,
For monitoring, I use a combination of netdata+prometheus. Data is
gathered whenever the nodes are up and stored for history. Yes, when the
nodes are powered down, there are empty gaps, but that is interpreted as
the node is powered down.
For the config, I have no access to DNS for co
David Simpson writes:
> * When you want to make changes to slurm.conf (or anything else) to
> a node which is down due to power saving (during a
> maintenance/reservation) what is your approach? Do you end up with 2
> slurm.confs (one for power saving and one that keeps everything up, to
> work
Hi all,
Interested to know what common approaches were to:
* Monitoring of power saving nodes (e.g. health of the node), when
potentially the monitoring system will see it go up and down. Do you limit to
BMC only monitoring/health?
* When you want to make changes to slurm.conf (or anyt