You might try looking at a partition QoS using the GrpTRESMins or GrpTRESRunMins: https://slurm.schedmd.com/resource_limits.html

There are a bunch of options which may do what you want.

-Paul Edmon-

On 3/10/2021 9:13 AM, Marcel Breyer wrote:

Greetings,

we know about the SLURM configuration option *MaxSubmitJobsPerUser* to limit the number of jobs a user can submit at a given time.

We would like to have a similar policy that says that the total time for all jobs of a user cannot exceed a certain time limit.

For example (normal *MaxSubmitJobsPerUser = 2*):

srun --time 10 ...
srun --time 20 ...
srun --time 10 ... <- fails since only 2 jobs are allowed per user


However, we want something like (for a maximum aggregate time of e.g. 40mins):

srun --time 10 ...
srun --time 20 ...
srun --time 10 ...
srun --time 5 ... <- fails since the total job times exceed 40mins


However, another allocation pattern could be:

srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ...
srun --time 5 ... <- fails since the total job times exceed 40mins (however, after the first job completed, the new job can be submitted normally)


In essence we would like to have a policy using the FIFO scheduler (such that we don't have to specify another complex scheduler) such that we can guarantee that another user has the chance to get access to a machine after at most X time units (40mins in the example above).

With the *MaxSubmitJobsPerUser *option we would have to allow only a really small number of jobs (penalizing users that divide their computation into small sub jobs) or X would be rather large (num_jobs * max_wall_time).

Is there an option in SLURMĀ  that mimics such a behavior?

With best regards,
Marcel Breyer

Reply via email to