I’ve setup SLURM to enable pre-emption so that high-priority jobs can take-over 
resources from lower-priority jobs.  As we use a lot of expensive EDA software, 
we want to get the best use of these expensive licenses.  The software all uses 
the FlexLM license manager, and when a job is suspended using SIGTSTP and later 
resumed with SIGCONT, it releases and then gets the license again allowing 
another job to use it.

I wrote a simple BASH script to test this behavior with SLURM:

#!/bin/bash

function suspendJob () {
  echo "INFO: Job Suspended"
}

function resumeJob () {
  echo "INFO: Job Resumed"
}

function terminateJob () {
  echo "INFO: Job Terminating..."
}

trap suspendJob   SIGTSTP
trap resumeJob    SIGCONT
trap terminateJob SIGTERM

echo "Burning some compute now...."
yes > /dev/null

When I configure SLURM to use:

     ProctrackType=protrack/pgid

This works as expected when I manually SUSPEND/RESUME/CANCEL a job with each of 
the corresponding messages appearing in the SLURM StdOut file.

When I change SLURM to use CGROUPS:

     ProctrackType=protrack/cgroup

No messages appear at all in the SLURM StdOut file indicated that the cgroup 
was thrown into freezer without any signals being sent.  Is this expected 
behavior and are there ways to “fix” this so that it behaves the same way as 
using Process Groups?

Maybe this is a moot point since SLURM still shows the License being Used under 
“scontrol show license” even if a job is suspended, but I figure that problem 
might be solvable…

Thanks,
Michael



Reply via email to