Hello
   We had a user post a large number of array jobs with a short actual run time 
(20-80 seconds, but mostly to the low end) and slurmctld was falling behind on 
RPC calls trying to handle the jobs. It was a bit awkward trying to slap 
arraytaskthrottle=5 on each of the queued array jobs while slurmctld was having 
issues handling the RPC load.

I'm looking to make a QOS with MaxJobsPerUser=50 set that I can quickly add to 
a user to throttle their jobs but..

1)      Adding a QOS to the user does not affect queued jobs so I still have to 
get all of the users jobids and modify each on directly.

2)      I queued up a test job with the QOS set and it is still running 100 
jobs at a time (what I set arraytaskthrottle to in the job) and not limiting 
the "user" to 50 jobs.

3)      I tried adding the FLAG OverPartQOS to see if that changed the 
behavior, but it did not seem to do anything. My test cluster I ran this on 
doesn't have any other QOSes defined but our production cluster does have a 
partition QOS in place limiting single users to about 80% of the CPUs with 
MaxTRESPerUser.

Is there a quick way to limit how many jobs a specific user can run at one time 
on the cluster or in a partition if we need to throttle them back in an 
emergency but we don't want to flat out kill their jobs?

Thanks.

Reply via email to