On 8/19/25 21:25, Jennings, Michael E via slurm-users wrote:
Have you by chance given the `dev` branch a try?  All our production servers 
currently run `lbnl-nhc-1.5-0.82.gf8dc.el8.noarch` built from the `dev` branch, 
have been for some time now, and it's been rock solid.  Our RHEL-based clusters 
also use this version.  Our HPE/Cray Shasta clusters, including our largest 
(classified) clusters Crossroads, Tycho, and Venado, use a variant.  (Long 
story short, I've merged in all my changes into a separate branch, but the 
reverse is not yet true.)  This variant is, at present, COS/SLES-specific, but 
it has quite a few useful additional checks (many of them Cray-centric) 
contributed by other LANL folks that I haven't had a chance to upstream yet.

Due to Michael's recommendation I wanted to try out the 'dev' branch version 1.5 of NHC and build an RPM package referred to by Michael.

Since I'm not a software developer, I had to figure out for myself the detailed building steps - perhaps trivial to some of you, and stumbling blocks to others. This is what I came up with:

$ git clone https://github.com/mej/nhc.git
$ cd nhc
$ git switch dev              # Switch to the 'dev' branch
$ git status                  # Check the status
$ grep nhc_version configure.ac       # Verify the 'dev' version
m4_define([nhc_version], [1.5])
$ ./autogen.sh                # Undocumented build requirement
$ cd ..
$ mv nhc lbnl-nhc-1.5         # Rename the source folder
$ tar czf lbnl-nhc-1.5.tar.gz lbnl-nhc-1.5
$ rpmbuild -ta lbnl-nhc-1.5.tar.gz

The resulting RPM package is:

~/rpmbuild/RPMS/noarch/lbnl-nhc-1.5-0.82.gf8dc.el8.noarch.rpm

I've added those steps to my Slurm Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#node-health-check

Any comments?

Thanks,
Ole

--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to