Our MaxTime and DefaultTime are 14-days.  Setting a high DefaultTime was a 
convenience to our users (and the support team) but has evolved into a mistake 
because it impacts backfill.  Under high load we'll see small backfill jobs 
take over because the estimated start and end time of "DefaultTime" jobs are 
wildly incorrect -- the backfill algorithm is less likely to calculate a delay 
in larger, highest-priority jobs and backfills smaller jobs.  I've tuned many 
of the backfill SchedulerParameters, but there's no replacement for an accurate 
time estimate.

Default values also become difficult to change once hundreds of submit scripts 
ignore them.  Jason, I think setting a small DefaultTime limit is a good 
approach.  We've considered resetting our default to 1 min to force jobs to 
specify a time but will (likely) target an average-ish value now that we have 
stats from a couple of million jobs.

- Sebastian

--

[University of Nevada, Reno]<http://www.unr.edu/>
Sebastian Smith
High-Performance Computing Engineer
Office of Information Technology
1664 North Virginia Street
MS 0291

work-phone: 775-682-5050<tel:7756825050>
email: stsm...@unr.edu<mailto:stsm...@unr.edu>
website: http://rc.unr.edu<http://rc.unr.edu/>

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Jason 
Simms <sim...@lafayette.edu>
Sent: Tuesday, October 6, 2020 7:53 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Simple free for all cluster

FWIW, I define the DefaultTime as 5 minutes, which effectively means for any 
"real" job that users must actually define a time. It helps users get into that 
habit, because in the absence of a DefaultTime, most will not even bother to 
think critically and carefully about what time limit is actually reasonable, 
which is important for, e.g., effective job backfill and scheduling estimations.

I currently don't have a MaxTime defined, because how do I know how long a job 
will take? Most jobs on my cluster require no more than 3-4 days, but in some 
cases at other campuses, I know that jobs can run for weeks. I suppose even 
setting a time limit such as 4 weeks would be overkill, but at least it's not 
infinite. I'm curious what others use as that value, and how you arrived at it.

Warmest regards,
Jason

On Tue, Oct 6, 2020 at 5:55 AM John H <j...@sdf.org<mailto:j...@sdf.org>> wrote:
Yes I hadn't considered that! Thanks for the tip, Michael I shall do that.

John

On Fri, Oct 02, 2020 at 01:49:44PM +0000, Renfro, Michael wrote:
> Depending on the users who will be on this cluster, I'd probably adjust the 
> partition to have a defined, non-infinite MaxTime, and maybe a lower 
> DefaultTime. Otherwise, it would be very easy for someone to start a job that 
> reserves all cores until the nodes get rebooted, since all they have to do is 
> submit a job with no explicit time limit (which would then use DefaultTime, 
> which itself has a default value of MaxTime).
>



--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632

Reply via email to