Thankyou both.  For interest, this is the health check

https://github.com/amd/node-scraper/

On Mon, 18 Aug 2025 at 14:01, Bjørn-Helge Mevik via slurm-users <
[email protected]> wrote:

> Ole Holm Nielsen via slurm-users <[email protected]> writes:
>
> > On 8/18/25 13:56, Gerhard Strangar via slurm-users wrote:
> >> John Hearns via slurm-users wrote:
> >>
> >>> I want to run a healtcheck job on all nodes.
> >> And using HealthCheckProgram in the slurm.conf would be too easy?
> >
> > But the HealthCheckProgram=/usr/sbin/nhc is executed only when slurmd
> > is started, and possibly when a new job is started.
>
> That depends on HealthCheckInterval and HealthCheckNodeState.  If
> HealthCheckInterval=N with N > 0, the HealthCheckProgram is run every N
> seconds, given that the node is in one of the HealthCheckNodeState
> states (default: any state).
>
> > I think John asked for a way to run NHC on a set of nodes whenever
> > desired by the system administrator, and not at any any random time,
> > right? ClusterShell is the ideal tool for making such parallel
> > commands on the cluster.
>
> Yes, for running manually, setting up the Slurm groups in clush is the
> easiest way, IMO.
>
> --
> Regards,
> Bjørn-Helge Mevik
>
> --
> slurm-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to