In the institution where I work, so far we have managed to live without mandatory wallclock limits (a policy decided well before I joined the organization), and that has been possible because the cluster was not very much utilized.
Now that is changing, with more jobs being submitted and those being larger ones. As such I would like to introduce wallclock limits to allow slurm to be more efficient in scheduling jobs, including with backfill. My concern is that this user base is not used to it and therefore I want to make it easier for them, and avoid common complaints. I anticipate one of them would be "my job was cancelled even though there were enough nodes idle and no other job in line after mine" (since the cluster utilization is increasing, but not yet always full like it has been at most other places I know). So my question is: is it possible to implement "soft" wallclock limits in slurm, namely ones which would not be enforced unless necessary to run more jobs? In other words, is it possible to change the pre-emptability of a job only after some time has passed? I can think of some ways to hack this functionality myself with some cron or at jobs, and that might be easy enough to do, but I am not sure I can make it robust enough to cover all situations, so I'm looking for something either slurm-native or (if external solution) field-tested by someone else already, so that at least the worst kinks have been already ironed out. Thanks in advance for any suggestions you may provide!
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com