Hi Loris.

Il 23/07/2021 09:05, Loris Bennett ha scritto:

We use both Zabbix and pestat.  Zabbix gives us general information on
the state of the nodes and file systems, and we have added some Slurm
metrics, such as number of jobs pending, amount of memory pending,
number of GPUs pending, etc.
Did you write a module for Zabbix or use one pre-made?

This has been quite handy, although I find Zabbix a bit tricky to
configure.  This maybe because (a) we are stuck on Version 3.4 due to
the PHP dependency with CentOS 7 and (b) I only do stuff very
irregularly with Zabbix and so always have to start somewhat
from scratch.
Quite a common problem, it seems. I'd probably try to decouple the PHP part (moving it to a newer VM) from the actual server.

pestat on the other hand gives us more information about what individual
jobs on individual nodes are up to at a given point in time.  I don't
quite see how one could integrate pestat itself directly into Zabbix, as
it is more geared to producing a report, but maybe Ole has ideas :-)
How to use the collected data is one of the big open problems in IT :)

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to