[slurm-users] MariaDB lock problems for sacctmgr delete query

2018-02-13 Thread Jessica Nettelblad
TLDR; If you get a timeout for the Slurm database, and a longer timelimit in innodb doesn't help, you might want to consider loosening the lock mode in MariaDB. The long story! So, we’ve just upgraded our main cluster to 17.11.3 and moved our database to Mariadb. There have been some glitches and

Re: [slurm-users] Are these threads actually unused?

2018-02-13 Thread Mike Cammilleri
I should also mention that of course we are aware that R is a single threaded application - but users can be doing all sorts of things within their R scripting. In this particular case this user is using the FLARE package I believe. Often they are seeking to do embarrassingly parallel types of t

[slurm-users] Are these threads actually unused?

2018-02-13 Thread Mike Cammilleri
I posted a question similar to this a couple months ago regarding CPU utilization which we figured out - sometimes too many threads on one cpu creates high CPU load, and thus slower compute time because things are waiting. A more proper allocation should be set in the submit script (e.g. --cpu

Re: [slurm-users] Free Gres resources

2018-02-13 Thread Nadav Toledo
This solution is even better. I am actually using pestat for my (as admin) needs. But I originally asked the question in order to enhance the ability of slurm_exporter which is a client side code for prometheus/grafana that export slurm statistics to be read as graphs.

Re: [slurm-users] Free Gres resources

2018-02-13 Thread Ole Holm Nielsen
On 02/13/2018 08:13 AM, Nadav Toledo wrote:> Does anyone know of way to get amount of idle gpu per node or for all cluster ? sinfo -o %G gives the total amount of gres resource for each node. Is there a way to get the idle amount same as you can get for cpu (%C)? Perhaps if one use lock file li

Re: [slurm-users] Free Gres resources

2018-02-13 Thread Nadav Toledo
Thanks ,that might be enough I will check it out On 13/02/2018 16:33, Yair Yarom wrote: Hi, I haven't found a direct way. Here I have my own script that parses the output of "scontrol show node" and "scontrol show job", summing up and displaying the allocated g

Re: [slurm-users] Free Gres resources

2018-02-13 Thread Yair Yarom
Hi, I haven't found a direct way. Here I have my own script that parses the output of "scontrol show node" and "scontrol show job", summing up and displaying the allocated gres. Yair. On Tue, Feb 13 2018, Nadav Toledo wrote: > Hello everyone, > > Does anyone know of way to get amount of i

Re: [slurm-users] retrieve the jobs restarted?

2018-02-13 Thread Henry Gérard
Hello all, as a workaround, i finally use a Epilog script to archive the jobs EpilogSlurmctld=/cm/local/apps/cmd/scripts/epilog-postjob in slurm.conf The script does: scontrol show job -d $SLURM_JOB_ID >> $JOBS_FILE Hth, Gérard Le 09/02/2018 à 17:58, Henry Gérard a écrit : Hello all, we have