Re: [slurm-users] Quickly throttling/limiting a specific user's jobs

Paul Edmon Tue, 22 Sep 2020 17:03:55 -0700

I would look at:

/MaxJobs/=<max jobs>
   Maximum number of jobs each user is allowed to run at one time in
   this association. This is overridden if set directly on a user.
   Default is the cluster's limit. To clear a previously set value use

the modify command with a new value of -1.Which is Association based. So you could just modify their accountdirectly and set it to something low.

You can also simply put their pending jobs in hold state. That way theywon't be considered for scheduling but won't be outright removed. Setting fairshare to 0 has the same effect.


-Paul Edmon-

On 9/22/2020 7:58 PM, Brian Andrus wrote:

Well, I know of no way to 'throttle' running jobs. Once they are outthe gate, you can't stop them from leaving..
That said, your approach of setting arraytaskthrottle is just what youwant for any pending jobs.
As a preventative measure, I imagine you could set the default to 1and then change it with a job_submit script.
As far as currently running tasks, well, you have to figure that. Youcould kill/requeue them, but that can break things for the user. Iftheir code supports it, they could checkpoint/restart as part of theprocess.
You can suspend them, but they still sit on the node waiting to beresumed, but the node resources may get assigned to other jobs whilethey wait to resume.
Brian Andrus


On 9/22/2020 2:33 PM, Ransom, Geoffrey M. wrote:
Hello
We had a user post a large number of array jobs with a shortactual run time (20-80 seconds, but mostly to the low end) andslurmctld was falling behind on RPC calls trying to handle the jobs.It was a bit awkward trying to slap arraytaskthrottle=5 on each ofthe queued array jobs while slurmctld was having issues handling theRPC load.
I’m looking to make a QOS with MaxJobsPerUser=50 set that I canquickly add to a user to throttle their jobs but..
1)Adding a QOS to the user does not affect queued jobs so I stillhave to get all of the users jobids and modify each on directly.
2)I queued up a test job with the QOS set and it is still running 100jobs at a time (what I set arraytaskthrottle to in the job) and notlimiting the “user” to 50 jobs.
3)I tried adding the FLAG OverPartQOS to see if that changed thebehavior, but it did not seem to do anything. My test cluster I ranthis on doesn’t have any other QOSes defined but our productioncluster does have a partition QOS in place limiting single users toabout 80% of the CPUs with MaxTRESPerUser.
Is there a quick way to limit how many jobs a specific user can runat one time on the cluster or in a partition if we need to throttlethem back in an emergency but we don’t want to flat out kill their jobs?
Thanks.

Re: [slurm-users] Quickly throttling/limiting a specific user's jobs

Reply via email to