Sure.  Here is what we have:

########################## Scheduling #####################################
### This section is specific to scheduling

### Tells the scheduler to enforce limits for all partitions
### that a job submits to.
EnforcePartLimits=ALL

### Let's slurm know that we have a jobsubmit.lua script
JobSubmitPlugins=lua

### When a job is launched this has slurmctld send the user information
### instead of having AD do the lookup on the node itself.
LaunchParameters=send_gids

### Maximum sizes for Jobs.
MaxJobCount=200000
MaxArraySize=10000
DefMemPerCPU=100

### Job Timers
CompleteWait=0

### We set the EpilogMsgTime long so that Epilog Messages don't pile up all
### at one time due to forced exit which can cause problems for the master.
EpilogMsgTime=3000000
InactiveLimit=0
KillWait=30

### This only applies to the reservation time limit, the job must still obey
### the partition time limit.
ResvOverRun=UNLIMITED
MinJobAge=600
Waittime=0

### Scheduling parameters
### FastSchedule 2 lets slurm know not to auto detect the node config
### but rather follow our definition.  We also use setting 2 as due to our geographic ### size nodes may drop out of slurm and then reconnect.  If we had 1 they would be ### set to drain when they reconnect.  Setting it to 2 allows them to rejoin with out
### issue.
FastSchedule=2
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

### Govern's default preemption behavior
PreemptType=preempt/partition_prio
PreemptMode=REQUEUE

### default_queue_depth should be some multiple of the partition_job_depth,
### ideally number_of_partitions * partition_job_depth, but typically the main ### loop exits prematurely if you go over about 400. A partition_job_depth of
### 10 seems to work well.
SchedulerParameters=\
default_queue_depth=1150,\
partition_job_depth=10,\
max_sched_time=50,\
bf_continue,\
bf_interval=30,\
bf_resolution=600,\
bf_window=11520,\
bf_max_job_part=0,\
bf_max_job_user=10,\
bf_max_job_test=10000,\
bf_max_job_start=1000,\
bf_ignore_newly_avail_nodes,\
kill_invalid_depend,\
pack_serial_at_end,\
nohold_on_prolog_fail,\
preempt_strict_order,\
preempt_youngest_first,\
max_rpc_cnt=8

################################ Fairshare ################################
### This section sets the fairshare calculations

PriorityType=priority/multifactor

### Settings for fairshare calculation frequency and shape.
FairShareDampeningFactor=1
PriorityDecayHalfLife=28-0
PriorityCalcPeriod=1

### Settings for fairshare weighting.
PriorityMaxAge=7-0
PriorityWeightAge=10000000
PriorityWeightFairshare=20000000
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=1000000000

I'm happy to chat about any of the settings if you want, or share our full config.

-Paul Edmon-

On 5/29/19 10:17 AM, Julius, Chad wrote:

All,

We rushed our Slurm install due to a short timeframe and missed some important items.  We are now looking to implement a better system than the first in, first out we have now.  My question, are the defaults listed in the slurm.conf file a good start?  Would anyone be willing to share their Scheduling section in their .conf?  Also we are looking to increase the maximum array size but I don’t see that in the slurm.conf in version 17.  Am I looking at an upgrade of Slurm in the near future or can I just add MaxArraySize=somenumber?

The defaults as of 17.11.8 are:

# SCHEDULING

#SchedulerAuth=

#SchedulerPort=

#SchedulerRootFilter=

#PriorityType=priority/multifactor

#PriorityDecayHalfLife=14-0

#PriorityUsageResetPeriod=14-0

#PriorityWeightFairshare=100000

#PriorityWeightAge=1000

#PriorityWeightPartition=10000

#PriorityWeightJobSize=1000

#PriorityMaxAge=1-0

*Chad Julius*

Cyberinfrastructure Engineer Specialist

*Division of Technology & Security*

SOHO 207, Box 2231

Brookings, SD 57007

Phone: 605-688-5767

www.sdstate.edu <http://www.sdstate.edu/>

cid:image007.png@01D24AF4.6CEECA30

Reply via email to