PS: we're using Slurm 17.11.5

Am 21.03.2018 um 16:18 schrieb Henkel, Andreas 
<hen...@uni-mainz.de<mailto:hen...@uni-mainz.de>>:


Hi,


recently, while trying a new configuration I came cross a Problem. In 
principal, we have one big Partition containing all nodes with PriorityTier=2. 
Each account got GrpTRESRunMin=cpu=<#somelimit> set. Every now and then we have 
the Situation that part of the nodes are idling. For this we created a second 
Partition with PriorityTier=1 and a GrpTRESRunMin-limit as PartitionQOS, which 
is larger than the one set on each account. Another Problem arises here: since 
both Partitions are configured equally (except for the mentioned Points) Jobs 
running in that scavenger-partition hinders the Jobs from starting in the 
regular Partition with pending reason AssocGrpTRESRunLimit. We wanted to solve 
this by using UsageFactor<<1 on the scavenger Partition or the Jobs running in 
that Partition. Unfortunately, UsageFactor doesn't take Usagefactor into 
account. Is that correct? We would like to have running Jobs in scavenger do 
not Count to the TRESRunMin but are accounted for regularly after the Job 
finished. Any hints? Also hints as for where in the source Code I could 
accomodate for this (guessing for acct_policy.c at the Moment)?


Best regards,


Andreas


Dr. Andreas Henkel
COO HPC
Data Center
JGU Mainz
Anselm-Franz-von-Bentzelweg 12
55099 Mainz

Reply via email to