Hi Diego,

On 7/23/21 8:16 AM, Diego Zuccato wrote:
The Configless Slurm (https://slurm.schedmd.com/configless_slurm.html) from 20.02 makes distribution of slurm.conf really simple.
Eager to see it in Debian :)

IMHO, there ought to be a community effort to provide up-to-date Slurm packages for Debian (and Ubuntu), just like a colleague did for the EPEL repository for RHEL and derivatives ;-) We run CentOS and can trivially build new RPMs from the Slurm source tar-balls.

For monitoring the state of compute nodes and their jobs, I recommend "pestat" from https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat I use "pestat -F" many times every day to see if any jobs are misbehaving.I'll have a look. I'm also setting up Zabbix for more general monitoring
but I'm not really OK with it yet (for example I still can't understand how I can exclude some metrics from a host that got 'em added by a template... When I'll have enough time I'll find a way :) ). Maybe pestat can be added to the Zabbix metrics...

Did you check out what pestat can do (and maybe not do) for you? If you have any suggestions for improving pestat, I'd be glad to see what I can do.

/Ole

Reply via email to