It shouldn't impact running jobs, all it should really do is impact
pending jobs as it will order them by their relative priority scores.
-Paul Edmon-
On 4/30/2021 12:39 PM, Walsh, Kevin wrote:
Hello everyone,
We wish to deploy "fair share" scheduling configuration and would like
to inquire if we should be aware of effects this might have on jobs
already running or already queued when the config is changed.
The proposed changes are from the example at
https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config
<https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config>
:
# Activate the Multi-factor Job Priority Plugin with decay
PriorityType=priority/multifactor
# 2 week half-life
PriorityDecayHalfLife=14-0
# The larger the job, the greater its job size priority.
PriorityFavorSmall=NO
# The job's age factor reaches 1.0 after waiting in the
# queue for 2 weeks.
PriorityMaxAge=14-0
# This next group determines the weighting of each of the
# components of the Multi-factor Job Priority Plugin.
# The default value for each of the following is 1.
PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=0 # don't use the qos factor
We're running SLURM 18.08.8 on CentOS Linux 7.8.2003. The current
slurm.conf is defaults as far as fair share is concerned:
EnforcePartLimits=ALL
GresTypes=gpu
MpiDefault=pmix
ProctrackType=proctrack/cgroup
PrologFlags=x11,contain
PropagateResourceLimitsExcept=MEMLOCK,STACK
RebootProgram=/sbin/reboot
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
SlurmdSyslogDebug=verbose
StateSaveLocation=/var/spool/slurm/ctld
SwitchType=switch/none
TaskPlugin=task/cgroup,task/affinity
TaskPluginParam=Sched
HealthCheckInterval=300
HealthCheckProgram=/usr/sbin/nhc
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
DefMemPerCPU=1024
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
AccountingStorageHost=sched-db.lan
AccountingStorageLoc=slurm_acct_db
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
AccountingStoreJobComment=YES
AccountingStorageTRES=gres/gpu
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=info
SlurmdDebug=info
SlurmSchedLogFile=/var/log/slurm/slurmsched.log
SlurmSchedLogLevel=1
Node and partition configs are omitted above.
Any and all advice will be greatly appreciated.
Best wishes,
~Kevin
Kevin Walsh
Senior Systems Administration Specialist
New Jersey Institute of Technology
Academic & Research Computing Systems