Hmmm... the man page says of "reduce_completing_frag,"
"By default if a job is found completing then no jobs are scheduled. If
this parameter is used the node in a completing job are taken out of
consideration."
This feels like it's missing a word or two. The first sentence says
that, by default, no jobs are scheduled if another job is found
"completing." The second sentence suggests that this parameter must be
set to prevent jobs from being scheduled in this situation.
BTW, the first sentence describes what I had expected to be the case.
What am I missing?
Andy
On 04/16/2018 02:15 PM, Kilian Cavalotti wrote:
Hi Andy,
On Mon, Apr 16, 2018 at 8:43 AM, Andy Riebs <andy.ri...@hpe.com> wrote:
I hadn't realized that jobs can be scheduled to run on a node that is still
in "completing" state from an earlier job. We occasionally use epilog
scripts that can take 30 seconds or longer, and we really don't want the
next job to start until the epilog scripts have completed.
Other than coding a little loop to wait until the desired nodes are "idle"
before scheduling a job, is there an automated way to say "don't start a job
on a node until it reaches 'idle' status?"
I'd recommend taking a look at the following options in slurm.conf:
* CompleteWait,
* reduce_completing_frag (in SchedulerParams).
Cheers,