Ole Holm Nielsen via slurm-users <[email protected]> writes:
> On 8/18/25 13:56, Gerhard Strangar via slurm-users wrote: >> John Hearns via slurm-users wrote: >> >>> I want to run a healtcheck job on all nodes. >> And using HealthCheckProgram in the slurm.conf would be too easy? > > But the HealthCheckProgram=/usr/sbin/nhc is executed only when slurmd > is started, and possibly when a new job is started. That depends on HealthCheckInterval and HealthCheckNodeState. If HealthCheckInterval=N with N > 0, the HealthCheckProgram is run every N seconds, given that the node is in one of the HealthCheckNodeState states (default: any state). > I think John asked for a way to run NHC on a set of nodes whenever > desired by the system administrator, and not at any any random time, > right? ClusterShell is the ideal tool for making such parallel > commands on the cluster. Yes, for running manually, setting up the Slurm groups in clush is the easiest way, IMO. -- Regards, Bjørn-Helge Mevik
signature.asc
Description: PGP signature
-- slurm-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
