> On Sat, 7 Oct 2017 08:21:08 -0400, Josh Catana said:
> This may have been brought up in the past, but I couldn't find much in my
message archive.
> What are people using for HPC cluster monitoring and metrics lately? I've
been low on time to add features to my home grown solution
> On 10/7/2017 8:21 AM, Josh Catana wrote:
>
> This may have been brought up in the past, but I couldn't find much in my
> message archive.
> What are people using for HPC cluster monitoring and metrics lately? I've
> been low on time to add features to my home grown solution and looking at
> some
So for general monitoring of the cluster usage we use:
https://github.com/fasrc/slurm-diamond-collector
and pipe to Graphana. We also use XDMod:
http://open.xdmod.org/7.0/index.html
As for specific node alerting, we use the old standby of Nagios.
-Paul Edmon-
On 10/7/2017 8:21 AM, Josh Cat
This may have been brought up in the past, but I couldn't find much in my
message archive.
What are people using for HPC cluster monitoring and metrics lately? I've
been low on time to add features to my home grown solution and looking at
some OTS products.
I'm looking for something that can do mo