Thanks Jeffrey, bypassing SIGTERM solved my problem! :D Best, Ailing
On Mon, Nov 20, 2017 at 8:33 AM, Jeffrey Frey <f...@udel.edu> wrote: > • *GraceTime*: Specifies a time period for a job to execute after it is > selected to be preempted. This option can be specified by partition or QOS > using the slurm.conf file or database respectively. This option is only > honored if PreemptMode=CANCEL. The GraceTime is specified in seconds and > the default value is zero, which results in no preemption delay. Once a job > has been selected for preemption, its end time is set to the current time > plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in > order to provide notification of its imminent termination. This is followed > by the SIGCONT, SIGTERM and SIGKILL signal sequence upon reaching its new > end time. > > > "The job is immediately sent SIGCONT and SIGTERM signals in order to > provide notification of its imminent termination." > > > Default behavior on SIGTERM is for a program to exit; your program is > probably ending when it receives that initial SIGTERM. > > > > > > > On Nov 20, 2017, at 10:21 AM, Ailing Zhang <zhangal1...@gmail.com> wrote: > > > Hi slurm community, > > I'm testing preemption with partition based preemption. Partitions > test-high and test-low share the same nodes. I set GraceTime=600 and > PreemptMode=CANCEL in test-low. But once I submitted a job to test-high, > job in test-low is immediately killed without any grace time. > Here is my configs. > PartitionName=test-low > AllowGroups=admins AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO QoS=N/A > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=600 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=node[100-102] > PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO > OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=CANCEL > State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > PartitionName=test-high > AllowGroups=admins AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO QoS=N/A > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=node[100-102] PriorityJobFactor=30 PriorityTier=30 RootOnly=NO > ReqResv=NO OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=OFF > State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > Any help will be much appreciated. > > Thanks! > Ailing > > > > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > Jeffrey T. Frey, Ph.D. > Systems Programmer V / HPC Management > Network & Systems Services / College of Engineering > University of Delaware, Newark DE 19716 > Office: (302) 831-6034 Mobile: (302) 419-4976 > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > > > > >