Hello Doug,

Thank you for your detailed reply regarding how to setup backfill. There's 
quite a lot to take in there. Fortunately, I now have a day or two to read up 
and understand the ideas now that our cluster is down due to a water cooling 
failure. In the first instance, I'll certainly implement bf_continue and 
review/amend the "bf_maxjobs" and "bf_interval" parameters. Switching on 
backfill debugging sounds very useful, but does that setting tend to blot  the 
logs if left enabled for long periods?


We did have a contract with SchedMD which recently finished. In one of the last 
discussions we had it was intimated that we may have hit a bug. That's in the 
respect that backfilled jobs were potentially stealing nodes intended for 
higher priority jobs -- bug 5297. The advice was to consider upgrading to slurm 
18.08.4 and implement bf_ignore_newly_avail_nodes. I was interested to see that 
you had a similar discussion with SchedMD and did upgrade. I think I ought to 
update the bf configuration re my first paragraph and see how that goes before 
we bite the bullet and do the upgrade (we are at 18.08.0 currently).


Best regards,

David

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Douglas 
Jacobsen <dmjacob...@lbl.gov>
Sent: 23 March 2019 13:30
To: Slurm User Community List
Subject: Re: [slurm-users] Backfill advice

Hello,

At first blush bf_continue and bf_interval as well as bf_maxjobs (if I 
remembered the parameter correctly) are critical first steps in tuning.  
Setting DebugFlags=backfill is essential to getting the needed data to make 
tuning decisions.

Use of per user/account settings if they are too low can also cause starvation 
depending on the way your priority calculation is set up.

I presented these slides a few years ago ag the slurm user group on this topic:
https://slurm.schedmd.com/SLUG16/NERSC.pdf<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2FSLUG16%2FNERSC.pdf&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C479a97721a87443f81c708d6af9455dd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=OC57jAeprk%2Bm1tCCn20mtAVzfvmbcj4AgJ5b3wVPJKI%3D&reserved=0>

The key thing to keep in mind with large jobs is that slurm needs to evaluate 
them again and again in the same order or the scheduled time may drift.  Thus 
it is important that once jobs are getting planning reservations they must 
continue to do so.

Because of the prevalence of large jobs at our site we use  bf_min_prio_resv 
which splits the priority space into a reserving and non-reserving set, and 
then use job age to allow jobs to age from the non reserving portion of the 
priority space to the reservation portion.  Use of the recent 
MaxJobsAccruePerUser limits on a job qos can throttle the rate of jobs aging 
and prevent negative effects from users submitting large numbers of jobs.

I realize that is a large number of tunables and concepts densely packed, but 
it should give you some reasonable starting points.

Doug


On Sat, Mar 23, 2019 at 05:26 david baker 
<djbake...@gmail.com<mailto:djbake...@gmail.com>> wrote:
Hello,

We do have large jobs getting starved out on our cluster, and I note 
particularly that we never manage to see a job getting assigned a start time. 
It seems very possible that backfilled jobs are stealing nodes reserved for 
large/higher priority jobs.

I'm wondering if our backfill configuration has any bearing on this issue or 
whether we are unfortunate enough to have hit a bug. One parameter that is 
missing in our bf setup is "bf_continue". Is that parameter significant in 
terms of ensuring that bf drills down sufficiently in the job mix? Also we are 
using the default bf frequency -- should we really reduce the frequency and 
potentially reduce the number of bf jobs per group/user or total at each 
iteration? Currently, I think we are setting the per/user limit to 20.

Any thoughts would be appreciated, please.

Best regards,
David
--
Sent from Gmail Mobile

Reply via email to