Re: [slurm-users] monitoring and accounting

2023-06-11 Thread Andrew Elwell
On Fri, 2 June 2023, 22:03 Jörg Striewski, wrote: > Hi, we use grafana with influx, it is easy to install and works fine > Hi Jörg, Are your slurm to influx scripts publicly available anywhere? I do something similar for squeue via python subprocess to call squeue -M all -a -o "%P,%a,%u,%D,%q,

[slurm-users] acct_gather_profile / influxdb

2020-11-03 Thread Andrew Elwell
Hi Gang, I'm soliciting support for https://bugs.schedmd.com/show_bug.cgi?id=8473 now that the rc1 for Slurm 20.11 has just been announced. Are there other sites out there using this plugin? Obviously NeSI are as they submitted the patch/bug (Thanks!), and uni-koeln around last December here on t

Re: [slurm-users] Slurmctld and log file

2020-09-09 Thread Andrew Elwell
As an aside, I've seen on one of the talk slides that using systemctl reload is a Bad Thing to do with logrotation for slurm - Simply send SIGUSR2 (or HUP for pre-17.11 versions apparently) https://bugs.schedmd.com/show_bug.cgi?id=4393 Andrew

[slurm-users] Alternatives for MailProg

2020-08-26 Thread Andrew Elwell
Hi folks, I'm getting fed up receiving out-of-office replies to slurm job state mails. Given that by default slurmctld just calls /bin/mail (aka mailx on our systems) which doesn't allow command line options to add headers such as 'Auto-generated: auto-submitted' to help educate auto responders,

Re: [slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

2020-02-22 Thread Andrew Elwell
Just realised after I sent message > Measurement: acct_gather_profile_task Tags: job, step, task, host Fields: CPUTime, CPUUtilization, CPUFrequency, RSS, VMSize, Pages, ReadMB, WriteMBTimestamp It'd also be useful to (if available) capture "cluster" as a tag too, - we have three clusters

Re: [slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

2020-02-22 Thread Andrew Elwell
On Sat, 14 Dec 2019 at 04:25, Lech Nieroda wrote: [OK, so I'm a bit lagged finding this] > I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit in > order to visualize the cpu and memory usage of live jobs. > Both the influxdb backend and Grafana dashboards seem like a perfec

Re: [slurm-users] slurm 17.02.10 mail problem

2018-08-11 Thread Andrew Elwell
On Thu., 9 Aug. 2018, 18:35 Marcel Sommer, wrote: > Hello, > > we compiled slurm version 17.02.10 a while ago and we have the problem > that slurm sends no e-mails. Nullmailer is installed on the systems and > works fine and the parameter MailProg=/usr/sbin/nullmailer-send is set > in the slurm.c

Re: [slurm-users] ntpd or chrony?

2018-01-14 Thread Andrew Elwell
> What are people's thoughts and what are people using? Since chrony is coming installed by default and for "normal" machines we don't need any of the esoteric features of ntpd, I've been switching over to chrony if it's the OS default. (I guess it's now the same as 'why do I need to run sendmail,

[slurm-users] [slurm 17.02] select/cray plugin from non-crays

2018-01-10 Thread Andrew Elwell
Hi folks, We've just upgraded to slurm 17.02.9 (native) on our Crays, but can't get sinfo to work on them anymore from a non-cray "sinfo: error: Cluster 'galaxy' has an unknown select plugin_id 108" On the Crays we have aelwell@galaxy-int:~/testjobs/native$ grep -i select /etc/opt/slurm/slurm.co