Thanks Bill, I really appreciate the time you spent giving this detailed answer. I will have a look at the plugin system as the integration with out accounting system would be a nice feature.
@Chris thanks, I've had a look GrpTRES but I'll probably go with the Spank route. Best, Matteo On 6 February 2018 at 13:58, Bill Barth <bba...@tacc.utexas.edu> wrote: > Chris probably gives the Slurm-iest way to do this, but we use a Spank > plugin that counts the jobs that a user has in queue (running and waiting) > and sets a hard cap on how many they can have. This should probably be > scaled to the size of the system and the partition they are submitting to, > but on Stampede 2 (4200 KNL nodes and 1736 SKX nodes), we set this, across > all queues to about 50, which has been our magic number, across numerous > schedulers over the years on systems ranging from hundreds of nodes to > Stamped2e 1 with 6400. Some users get more by request and most don’t even > bump up against the limits. We’ve started to look at using TRES on our test > system, but we haven’t gotten there yet. Our use of the DB is minimal, and > our process to get every user into it when their TACC account is created is > not 100% automated yet (we use the job completion plugin to create a flat > file with job records which our local accounting system consumes to > decrement allocation balances, if you care to know). > > Best, > Bill. > > -- > Bill Barth, Ph.D., Director, HPC > bba...@tacc.utexas.edu | Phone: (512) 232-7069 > Office: ROC 1.435 | Fax: (512) 475-9445 > > > > On 2/6/18, 6:03 AM, "slurm-users on behalf of Christopher Samuel" < > slurm-users-boun...@lists.schedmd.com on behalf of ch...@csamuel.org> > wrote: > > On 06/02/18 21:40, Matteo F wrote: > > > I've tried to limit the number of running job using Qos -> > > MaxJobsPerAccount, but this wouldn't stop a user to just fill up the > > cluster with fewer (but bigger) jobs. > > You probably want to look at what you can do with the slurmdbd database > and associations. Things like GrpTRES: > > https://slurm.schedmd.com/sacctmgr.html > > # GrpTRES=<TRES=max TRES,...> > # Maximum number of TRES running jobs are able to be allocated in > # aggregate for this association and all associations which are > children > # of this association. To clear a previously set value use the modify > # command with a new value of -1 for each TRES id. > # > # NOTE: This limit only applies fully when using the Select Consumable > # Resource plugin. > > Best of luck, > Chris > > > >