Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-08 Thread Chris Samuel
On Friday, 9 February 2018 2:49:52 AM AEDT Patrick Goetz wrote: > What is TRES? Trackable resources (CPUs, memory, GPUs, etc). https://slurm.schedmd.com/tres.html A lot more flexible than the old system of just tracking CPUs. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-08 Thread Patrick Goetz
What is TRES? On 02/06/2018 06:03 AM, Christopher Samuel wrote: On 06/02/18 21:40, Matteo F wrote: I've tried to limit the number of running job using Qos -> MaxJobsPerAccount, but this wouldn't stop a user to just fill up the cluster with fewer (but bigger) jobs. You probably want to look

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-07 Thread Fulcomer, Samuel
We use GrpTresRunMins for this, with the idea that it's OK for users to occupy lots of resources with short-running jobs, but not so much with long-running jobs. On Wed, Feb 7, 2018 at 8:41 AM, Bill Barth wrote: > Of course, Matteo. Happy to help. Our job completion script is: > > #!/bin/bash >

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-07 Thread Bill Barth
Of course, Matteo. Happy to help. Our job completion script is: #!/bin/bash OUTFILE=/var/log/slurm/tacc_jobs_completed echo "$JOBID:$UID:$ACCOUNT:$BATCH:$START:$END:$SUBMIT:$PARTITION:$LIMIT:$JOBNAME:$JOBSTATE:$NODECNT:$PROCS" >> $OUTFILE exit 0 and our config settings (from scontrol show co

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Matteo F
Thanks Bill, I really appreciate the time you spent giving this detailed answer. I will have a look at the plugin system as the integration with out accounting system would be a nice feature. @Chris thanks, I've had a look GrpTRES but I'll probably go with the Spank route. Best, Matteo On 6 Febr

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Bill Barth
Chris probably gives the Slurm-iest way to do this, but we use a Spank plugin that counts the jobs that a user has in queue (running and waiting) and sets a hard cap on how many they can have. This should probably be scaled to the size of the system and the partition they are submitting to, but

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Christopher Samuel
On 06/02/18 21:40, Matteo F wrote: I've tried to limit the number of running job using Qos -> MaxJobsPerAccount, but this wouldn't stop a user to just fill up the cluster with fewer (but bigger) jobs. You probably want to look at what you can do with the slurmdbd database and associations. Th

[slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Matteo F
Hello there. I've just set up a small Slurm cluster for our on-premise computation needs (nothing too exotic, just a bunch of R scripts). The systems "works" if the sense that users are able to submit jobs, but I have an issue with resources management: a single user can consume all resources of