PS: we're using Slurm 17.11.5 Am 21.03.2018 um 16:18 schrieb Henkel, Andreas <hen...@uni-mainz.de<mailto:hen...@uni-mainz.de>>:
Hi, recently, while trying a new configuration I came cross a Problem. In principal, we have one big Partition containing all nodes with PriorityTier=2. Each account got GrpTRESRunMin=cpu=<#somelimit> set. Every now and then we have the Situation that part of the nodes are idling. For this we created a second Partition with PriorityTier=1 and a GrpTRESRunMin-limit as PartitionQOS, which is larger than the one set on each account. Another Problem arises here: since both Partitions are configured equally (except for the mentioned Points) Jobs running in that scavenger-partition hinders the Jobs from starting in the regular Partition with pending reason AssocGrpTRESRunLimit. We wanted to solve this by using UsageFactor<<1 on the scavenger Partition or the Jobs running in that Partition. Unfortunately, UsageFactor doesn't take Usagefactor into account. Is that correct? We would like to have running Jobs in scavenger do not Count to the TRESRunMin but are accounted for regularly after the Job finished. Any hints? Also hints as for where in the source Code I could accomodate for this (guessing for acct_policy.c at the Moment)? Best regards, Andreas Dr. Andreas Henkel COO HPC Data Center JGU Mainz Anselm-Franz-von-Bentzelweg 12 55099 Mainz