In the institution where I work, so far we have managed to live without
mandatory wallclock limits (a policy decided well before I joined the
organization), and that has been possible because the cluster was not very
much utilized.

Now that is changing, with more jobs being submitted and those being larger
ones. As such I would like to introduce wallclock limits to allow slurm to
be more efficient in scheduling jobs, including with backfill.

My concern is that this user base is not used to it and therefore I want to
make it easier for them, and avoid common complaints. I anticipate one of
them would be "my job was cancelled even though there were enough nodes
idle and no other job in line after mine" (since the cluster utilization is
increasing, but not yet always full like it has been at most other places I
know).

So my question is: is it possible to implement "soft" wallclock limits in
slurm, namely ones which would not be enforced unless necessary to run more
jobs? In other words, is it possible to change the pre-emptability of a job
only after some time has passed? I can think of some ways to hack this
functionality myself with some cron or at jobs, and that might be easy
enough to do, but I am not sure I can make it robust enough to cover all
situations, so I'm looking for something either slurm-native or (if
external solution) field-tested by someone else already, so that at least
the worst kinks have been already ironed out.

Thanks in advance for any suggestions you may provide!
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to