On 8/19/25 21:25, Jennings, Michael E via slurm-users wrote:
Have you by chance given the `dev` branch a try?  All our production servers 
currently run `lbnl-nhc-1.5-0.82.gf8dc.el8.noarch` built from the `dev` branch, 
have been for some time now, and it's been rock solid.  Our RHEL-based clusters 
also use this version.  Our HPE/Cray Shasta clusters, including our largest 
(classified) clusters Crossroads, Tycho, and Venado, use a variant.  (Long 
story short, I've merged in all my changes into a separate branch, but the 
reverse is not yet true.)  This variant is, at present, COS/SLES-specific, but 
it has quite a few useful additional checks (many of them Cray-centric) 
contributed by other LANL folks that I haven't had a chance to upstream yet.

I've built and tested the lbnl-nhc-1.5-0.82.gf8dc.el8.noarch RPM which is built as described in my previous mail. Unfortunately LBNL NHC version 1.5-0.82.gf8dc fails to recognize NHC_RM=slurm in nhc.conf. We get this error:

/usr/libexec/nhc/node-mark-online: Unsupported RM detected in /usr/libexec/nhc/node-mark-online: ""

I created a new issue https://github.com/mej/nhc/issues/165

We need this issue to be fixed before we can deploy NHC 1.5 :-(

Any suggestions?

Thanks,
Ole

--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to