a. You pretty much have to roll your own. We do it with our
serial_requeue partition which underlays all our hardware and is at the
lower priority.
b. I haven't used the suspend function for partition scheduling so I'm
not aware of what quirks there are. We use requeue. A caution I would
have about using suspend is that while the job is suspended, the memory
that job was using is still allocated. Thus that may be why your jobs
are not moving immediately as Slurm will still consider the memory space
allocated though the CPU is now free.
-Paul Edmon-
On 7/8/19 6:03 PM, Hanu Pathuri wrote:
Hello,
I am trying to setup my SLURM cluster. One of thing I want to achieve
was to schedule jobs which will be run on when there are no high
priority tasks.
My understanding is that this can be achieved by either configuring a
partition with pre-empt mode ‘Suspend/Reque’ with priority for this
being very low or have a QOS configured with very low priority.
Here are my questions:
1. Is there built-in ‘scavenger’ feature/partition or ‘qos’ that I
can make use of?
2. When I created a ‘scavenger’ partition with lower priority
(preempt mode=suspend), and debug partition with higher priority
(preempt mode=suspend), scheduler is not pre-empting the jobs run
on scavenger partition when I submit the jobs with debug partition
immediately. However if I change the pre-empt mode for scavenger
partition to ‘requeue’, pre-emptions happens immediately.
Thanks