a. You pretty much have to roll your own.  We do it with our serial_requeue partition which underlays all our hardware and is at the lower priority.

b. I haven't used the suspend function for partition scheduling so I'm not aware of what quirks there are.  We use requeue.  A caution I would have about using suspend is that while the job is suspended, the memory that job was using is still allocated.  Thus that may be why your jobs are not moving immediately as Slurm will still consider the memory space allocated though the CPU is now free.

-Paul Edmon-

On 7/8/19 6:03 PM, Hanu Pathuri wrote:

Hello,

I am trying to setup my SLURM cluster. One of thing I want to achieve was to schedule jobs which will be run on when there are no high priority tasks.

My understanding is that this can be achieved by either configuring a partition with pre-empt mode ‘Suspend/Reque’ with priority for this being very low or have a QOS configured with very low priority.

Here are my questions:

 1. Is there built-in ‘scavenger’ feature/partition or ‘qos’ that I
    can make use of?
 2. When I created a ‘scavenger’ partition with lower priority
     (preempt mode=suspend), and debug partition with higher priority
    (preempt mode=suspend), scheduler is not pre-empting the jobs run
    on scavenger partition when I submit the jobs with debug partition
    immediately. However if I change the pre-empt mode for scavenger
    partition to ‘requeue’, pre-emptions happens immediately.

Thanks

Reply via email to