On Friday, 9 February 2018 2:49:52 AM AEDT Patrick Goetz wrote:
> What is TRES?
Trackable resources (CPUs, memory, GPUs, etc).
https://slurm.schedmd.com/tres.html
A lot more flexible than the old system of just tracking CPUs.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
What is TRES?
On 02/06/2018 06:03 AM, Christopher Samuel wrote:
On 06/02/18 21:40, Matteo F wrote:
I've tried to limit the number of running job using Qos ->
MaxJobsPerAccount, but this wouldn't stop a user to just fill up the
cluster with fewer (but bigger) jobs.
You probably want to look
We use GrpTresRunMins for this, with the idea that it's OK for users to
occupy lots of resources with short-running jobs, but not so much with
long-running jobs.
On Wed, Feb 7, 2018 at 8:41 AM, Bill Barth wrote:
> Of course, Matteo. Happy to help. Our job completion script is:
>
> #!/bin/bash
>
Of course, Matteo. Happy to help. Our job completion script is:
#!/bin/bash
OUTFILE=/var/log/slurm/tacc_jobs_completed
echo
"$JOBID:$UID:$ACCOUNT:$BATCH:$START:$END:$SUBMIT:$PARTITION:$LIMIT:$JOBNAME:$JOBSTATE:$NODECNT:$PROCS"
>> $OUTFILE
exit 0
and our config settings (from scontrol show co
Thanks Bill, I really appreciate the time you spent giving this detailed
answer.
I will have a look at the plugin system as the integration with out
accounting system would be a nice feature.
@Chris thanks, I've had a look GrpTRES but I'll probably go with the Spank
route.
Best,
Matteo
On 6 Febr
Chris probably gives the Slurm-iest way to do this, but we use a Spank plugin
that counts the jobs that a user has in queue (running and waiting) and sets a
hard cap on how many they can have. This should probably be scaled to the size
of the system and the partition they are submitting to, but
On 06/02/18 21:40, Matteo F wrote:
I've tried to limit the number of running job using Qos ->
MaxJobsPerAccount, but this wouldn't stop a user to just fill up the
cluster with fewer (but bigger) jobs.
You probably want to look at what you can do with the slurmdbd database
and associations. Th
Hello there.
I've just set up a small Slurm cluster for our on-premise computation needs
(nothing too exotic, just a bunch of R scripts).
The systems "works" if the sense that users are able to submit jobs, but I
have an issue with resources management: a single user can consume all
resources of