Mike via slurm-users
<slurm-users@lists.schedmd.com> writes:
> Greetings,
>
> We are new to Slurm and we are trying to better understand why we’re seeing 
> high-mem jobs stuck in Pending state indefinitely. Smaller (mem) jobs in the
> queue will continue to pass by the high mem jobs even when we bump priority 
> on a pending high-mem job way up. We have been reading over the backfill
> scheduling page and what we think we're seeing is that we need to require 
> that users specify a --time parameter on their jobs so that Backfill works 
> properly.
> None of our users specify a --time param because we have never required it. 
> Is that what we need to require in order to fix this situation? From the 
> backfill
> page:  "Backfill scheduling is difficult without reasonable time limit 
> estimates for jobs, but some configuration parameters that can help" and it 
> goes on to list
> some config params that we have not set (DefaultTime, MaxTime, 
> OverTimeLimit). We also see language such as, “Since the expected start time 
> of pending jobs
> depends upon the expected completion time of running jobs, reasonably 
> accurate time limits are important for backfill scheduling to work well.” So 
> we
> suspect that we can achieve proper backfill scheduling by requiring that all 
> users supply a "--time" parameter via a job submit plugin. Would that be a 
> fair
> statement?

You might also need to look at the configuration parameter

  SchedulerParameters

in particular

  bf_window=#
         The  number  of minutes into the future to look when considering jobs
         to schedule.  Higher values result in more overhead and less  respon‐
         siveness.  A value at least as long as the highest allowed time limit
         is generally advisable to prevent job starvation.  In order to  limit
         the amount of data managed by the backfill scheduler, if the value of
         bf_window is increased,  then  it  is  generally  advisable  to  also
         increase  bf_resolution.   This  option  applies  only  to Scheduler‐
         Type=sched/backfill.  Default: 1440 (1 day), Min: 1, Max:  43200  (30
         days).

Regards

Loris Bennett

-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to