Hello Slurm Users,

Our system does not allow much testing at the moment so I want to make use of 
community knowledge. The multifactor plugin has many handles to tweak. That 
makes it powerful and daunting at the same time. Basically how do you setup for 
various user groups based on urgency, resource usage, and resource guaranty? I 
am thinking of these groups:


  1.  daily users: medium requirements for RAM, CPU, GPU, storage. These can 
wait if the resources are busy. Their jobs may be even suspended/paused to give 
resources to other needs.
  2.  deadliners: Need constant and guaranteed access to resources; they cannot 
wait.
  3.  developers: run short and light jobs but require real time/near real time 
responsiveness.

In an ideal world, we may simply have dedicated nodes for these needs. However, 
if you can’t afford to have that many nodes. Can we mix these three needs on 
the same nodes?

For example, I have FrontEnd Node (FN) and two compute node N1, N2. This is the 
default partition:

PartitionName=daily Nodes=ALL Default=YES DefMemPerCPU=0 State=UP  
OverSubscribe=NO  MaxTime=INFINITE 
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
 DefCpuPerGPU=2

How do I define a ‘developers’ partition for development that allows user to 
run temporary debug session with maximum walltime of 8:00:00 hours 
(MaxTime=8:00:00). Furthermore, jobs in this partition have highest priority, 
and are preferably started right away? Do I need also setup a ‘debug’ QOS as 
well? Last but not least, if the time is up for a job in this partition, can I 
set the job to be in suspended state.



[signature_752887066]
Vang Quy Le
Special Consultant in Data Science and Infrastructure

T: (+45) 9940 7710 | Email: v...@its.aau.dk<mailto:v...@its.aau.dk>
Kontor 0-1-91 | Selma Lagerløfs Vej 300 | DK-9220 Aalborg Ø |

Reply via email to