Hi All

Have been reading on the archive hoping to implement unkillablesteptimeout and 
unkillablesteprogram to the slurm
But I'm kind of confuse with it application


  1.  I presume UnkillableStepTimeout is set in slurm.conf. and it act as a 
timer to trigger UnkillableStepProgram
  2.  UnkillableStepProgram   can be use to send email or reboot compute node - 
question is how do we configure it ?


scontrol show config | grep -i kill
KillOnBadExit           = 1
KillWait                = 30 sec
UnkillableStepProgram   = (null)
UnkillableStepTimeout   = 300 sec

Please advise

Thanks
Mike

Reply via email to