Hello,

We are running slurm 17.02.6 with accounting on a cray CLE system.

We currently have a 24 hour job run limit on our partitions and a user needs to 
run a job which will exceed 24 hours runtime.  I tried to make a reservation as 
seen below allocating the user 36 hours to run his job but it was killed at the 
24 hour run limit.  Can someone explain what is going on and what is the proper 
way to allow a user to exceed the partition time limit without having to modify 
slurm.conf and push it out to all of the nodes, run ansible plays  and 
reconfigure etc.  I thought that this is what reservation was for.

Here is the reservation I created that failed when it ran over 24 hours

scontrol show res
ReservationName=CoolBreeze StartTime=2018-12-27T10:08:11 
EndTime=2018-12-28T22:08:11 Duration=1-12:00:00
Nodes=nid00[192-239] NodeCnt=48 CoreCnt=480 Features=(null) PartitionName=GPU 
Flags=
TRES=cpu=960
Users=coolbreeze Accounts=(null) Licenses=(null) State=ACTIVE 
BurstBuffer=(null) Watts=n/a

Here is the partition with the resources the user needs to run on

PartitionName=GPU
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 
Hidden=NO
   MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO 
MaxCPUsPerNode=UNLIMITED
   Nodes=nid00[192-255]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO 
OverSubscribe=EXCLUSIVE
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=1280 TotalNodes=64 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Thanks!



Reply via email to