Hi,
could you try submitting the following script:
Script job.sh:
******************************
#!/bin/bash
#SBATCH -p test-low
#SBATCH -n 3
#SBATCH -t 12:00:00
sig_term()
{
echo "function sig_term called. Exiting"
echo 'sig_term' > slask_term
echo $(date) >> slask_term
}
# associate the function "term_handler" with the TERM signal
trap 'sig_term' SIGTERM
sleep 1000 &
wait $!
******************************
and see if you catch the first SIGTERM. When I tried this signal was
ONLY caught at the end of the grace time.
(I'll try your settings as soon as my system is up again)
Regards,
/jon
On 11/20/2017 04:21 PM, Ailing Zhang wrote:
Hi slurm community,
I'm testing preemption with partition based preemption. Partitions
test-high and test-low share the same nodes. I set GraceTime=600
and PreemptMode=CANCEL in test-low. But once I submitted a job to
test-high, job in test-low is immediately killed without any grace time.
Here is my configs.
PartitionName=test-low
AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=600
Hidden=NO
MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
MaxCPUsPerNode=UNLIMITED
Nodes=node[100-102]
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO
OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=CANCEL
State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=test-high
AllowGroups=admins AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=UNLIMITED MaxTime=02:00:00 MinNodes=1 LLN=NO
MaxCPUsPerNode=UNLIMITED
Nodes=node[100-102] PriorityJobFactor=30 PriorityTier=30
RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=100 TotalNodes=3 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
Any help will be much appreciated.
Thanks!
Ailing