Hi all, Looked back at the mailing list to see if there was a question about this already. There was some mention of /using/ Nagios, but no real mention of specifics. What do people monitor with Nagios? We monitor, so far, slurmctld, slurmdbd, and MySQL, but there are probably some others. Might be helpful to run “scontrol ping” for example, or similar, on our login nodes.
Does anyone have any plugins they’ve written or ideas they can share? Nagios Exchange doesn’t have anything with SLURM anywhere in the name. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP