Hello We had a user post a large number of array jobs with a short actual run time (20-80 seconds, but mostly to the low end) and slurmctld was falling behind on RPC calls trying to handle the jobs. It was a bit awkward trying to slap arraytaskthrottle=5 on each of the queued array jobs while slurmctld was having issues handling the RPC load.
I'm looking to make a QOS with MaxJobsPerUser=50 set that I can quickly add to a user to throttle their jobs but.. 1) Adding a QOS to the user does not affect queued jobs so I still have to get all of the users jobids and modify each on directly. 2) I queued up a test job with the QOS set and it is still running 100 jobs at a time (what I set arraytaskthrottle to in the job) and not limiting the "user" to 50 jobs. 3) I tried adding the FLAG OverPartQOS to see if that changed the behavior, but it did not seem to do anything. My test cluster I ran this on doesn't have any other QOSes defined but our production cluster does have a partition QOS in place limiting single users to about 80% of the CPUs with MaxTRESPerUser. Is there a quick way to limit how many jobs a specific user can run at one time on the cluster or in a partition if we need to throttle them back in an emergency but we don't want to flat out kill their jobs? Thanks.