Our MaxTime and DefaultTime are 14-days. Setting a high DefaultTime was a convenience to our users (and the support team) but has evolved into a mistake because it impacts backfill. Under high load we'll see small backfill jobs take over because the estimated start and end time of "DefaultTime" jobs are wildly incorrect -- the backfill algorithm is less likely to calculate a delay in larger, highest-priority jobs and backfills smaller jobs. I've tuned many of the backfill SchedulerParameters, but there's no replacement for an accurate time estimate.
Default values also become difficult to change once hundreds of submit scripts ignore them. Jason, I think setting a small DefaultTime limit is a good approach. We've considered resetting our default to 1 min to force jobs to specify a time but will (likely) target an average-ish value now that we have stats from a couple of million jobs. - Sebastian -- [University of Nevada, Reno]<http://www.unr.edu/> Sebastian Smith High-Performance Computing Engineer Office of Information Technology 1664 North Virginia Street MS 0291 work-phone: 775-682-5050<tel:7756825050> email: stsm...@unr.edu<mailto:stsm...@unr.edu> website: http://rc.unr.edu<http://rc.unr.edu/> ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Jason Simms <sim...@lafayette.edu> Sent: Tuesday, October 6, 2020 7:53 AM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Simple free for all cluster FWIW, I define the DefaultTime as 5 minutes, which effectively means for any "real" job that users must actually define a time. It helps users get into that habit, because in the absence of a DefaultTime, most will not even bother to think critically and carefully about what time limit is actually reasonable, which is important for, e.g., effective job backfill and scheduling estimations. I currently don't have a MaxTime defined, because how do I know how long a job will take? Most jobs on my cluster require no more than 3-4 days, but in some cases at other campuses, I know that jobs can run for weeks. I suppose even setting a time limit such as 4 weeks would be overkill, but at least it's not infinite. I'm curious what others use as that value, and how you arrived at it. Warmest regards, Jason On Tue, Oct 6, 2020 at 5:55 AM John H <j...@sdf.org<mailto:j...@sdf.org>> wrote: Yes I hadn't considered that! Thanks for the tip, Michael I shall do that. John On Fri, Oct 02, 2020 at 01:49:44PM +0000, Renfro, Michael wrote: > Depending on the users who will be on this cluster, I'd probably adjust the > partition to have a defined, non-infinite MaxTime, and maybe a lower > DefaultTime. Otherwise, it would be very easy for someone to start a job that > reserves all cores until the nodes get rebooted, since all they have to do is > submit a job with no explicit time limit (which would then use DefaultTime, > which itself has a default value of MaxTime). > -- Jason L. Simms, Ph.D., M.P.H. Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632