Thank you, Paul. I'll try this workaround.
Best, Jianwen > On Sep 16, 2020, at 9:31 PM, Paul Edmon <ped...@cfa.harvard.edu> wrote: > > This is a feature of suspend. When Slurm suspends a job it actually does not > leave the cpus used by that job reserved but instead pauses the job and keeps > memory reserved but not the cpus. > > If you want to pause jobs and not have contention you need to use scancel and > use the: > > -s, --signal=signal_name > The name or number of the signal to send. If this option is not used the > specified job or step will be terminated. Note. If this option is used the > signal is sent directly to the slurmd where the job is running bypassing the > slurmctld thus the job state will not change even if the signal is delivered > to it. Use the scontrol command if you want the job state change be known to > slurmctld. > And issue the SIGSTOP or SIGCONT. > > Frankly I wish suspend didn't work like this. It should work where it > suspends the job and does not release the cpus but keeps them reserved. > That's the natural understanding of suspend, but that's not the way suspend > actually work in Slurm. > > -Paul Edmon- > > On 9/16/2020 6:08 AM, SJTU wrote: >> Hi, >> >> I am using SLURM 19.05 and found that SLURM may launch jobs onto nodes with >> suspended jobs, which leads to resource contention after the suspended jobs' >> restoration. Steps to reproduce this issue are: >> >> 1. Launch 40 one-core jobs on a 40-core compute node. >> 2. Suspend all 40 jobs on that compute node with `scontrol suspend JOBID`. >> >> Expected results: No more jobs should be launched on to the compute node >> since there are 40 suspended jobs on it already. >> >> Actual results: SLURM launches new jobs on that compute node, which may lead >> to resource contention if the previously suspended jobs are restored via >> `scontrol resume` at the moment. >> >> Any suggestion is appreciated. Part of slurm.conf is attached. >> >> Thank you! >> >> >> Jianwen >> >> >> >> >> AccountingStorageEnforce = associations,limits,qos,safe >> AccountingStorageType = accounting_storage/slurmdbd >> AuthType = auth/munge >> BackupController = slurm2 >> CacheGroups = 0 >> ClusterName = mycluster >> ControlMachine = slurm1 >> EnforcePartLimits = true >> Epilog = /etc/slurm/slurm.epilog >> FastSchedule = 1 >> GresTypes = gpu >> HealthCheckInterval = 300 >> HealthCheckProgram = /usr/sbin/nhc >> InactiveLimit = 0 >> JobAcctGatherFrequency = 30 >> JobAcctGatherType = jobacct_gather/cgroup >> JobCompType = jobcomp/none >> JobRequeue = 0 >> JobSubmitPlugins = lua >> KillOnBadExit = 1 >> KillWait = 30 >> MailProg = /opt/slurm-mail/bin/slurm-spool-mail.py >> MaxArraySize = 8196 >> MaxJobCount = 100000 >> MessageTimeout = 30 >> MinJobAge = 300 >> MpiDefault = none >> PriorityDecayHalfLife = 31-0 >> PriorityFavorSmall = false >> PriorityFlags = ACCRUE_ALWAYS,FAIR_TREE >> PriorityMaxAge = 7-0 >> PriorityType = priority/multifactor >> PriorityWeightAge = 10000 >> PriorityWeightFairshare = 10000 >> PriorityWeightJobSize = 40000 >> PriorityWeightPartition = 10000 >> PriorityWeightQOS = 0 >> PrivateData = accounts,jobs,usage,users,reservations >> ProctrackType = proctrack/cgroup >> Prolog = /etc/slurm/slurm.prolog >> PrologFlags = contain >> PropagateResourceLimitsExcept = MEMLOCK >> RebootProgram = /usr/sbin/reboot >> ResumeTimeout = 600 >> ResvOverRun = UNLIMITED >> ReturnToService = 1 >> SchedulerType = sched/backfill >> SelectType = select/cons_res >> SelectTypeParameters = CR_CPU >> SlurmUser = root >> SlurmctldDebug = info >> SlurmctldLogFile = /var/log/slurmctld.log >> SlurmctldPidFile = /var/run/slurmctld.pid >> SlurmctldPort = 6817 >> SlurmctldTimeout = 120 >> SlurmdDebug = info >> SlurmdLogFile = /var/log/slurmd.log >> SlurmdPidFile = /var/run/slurmd.pid >> SlurmdPort = 6818 >> SlurmdSpoolDir = /tmp/slurmd >> SlurmdTimeout = 300 >> SrunPortRange = 60001-63000 >> StateSaveLocation = /etc/slurm/state >> SwitchType = switch/none >> TaskPlugin = task/cgroup >> Waittime = 0 >> >> >> # Nodes >> NodeName=cas[001-100] CPUs=40 SocketsPerBoard=2 CoresPerSocket=20 >> ThreadsPerCore=1 RealMemory=190000 Weight=60 >> >> >> # Partitions >> PartitionName=small Nodes=cas[001-100] MaxCPUsPerNode=39 MaxNodes=1 >> MaxTime=7-00:00:00 DefMemPerCPU=4700 MaxMemPerCPU=4700 State=UP AllowQos=ALL >> >> >> >> > _______________________________________________ > Support mailing list > supp...@lists.hpc.sjtu.edu.cn > http://lists.hpc.sjtu.edu.cn/mailman/listinfo/support