Try suspending and resuming the users pending jobs to force a re-evaluation.

If the user is not in the zone of jobs that is evaluated, ie if enough higher 
priority jobs have dropped in ahead then this job may not have been evaluated 
for scheduling since a point in time when the user was indeed pending for that 
reason.

Jenny

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Simon 
Andrews
Sent: Monday, April 27, 2020 5:58 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] QOS cutting off users before CPU limit is reached

I'm trying to use QoS limits to dynamically change the number of CPUs a user is 
allowed to use on our cluster.  As far as I can see I'm setting the appropriate 
GrpTRES=cpu value and I can read that back, but then jobs are being stopped 
before the user has reached that limit.

In squeue I see loads of lines like:

166599    normal nf-BISMARK_(288)               auser     PD       0:00      1 
(QOSMaxCpuPerUserLimit)

..but if I run:

squeue -t running -p normal --format="%.12u %.2t %C "

Then the total for that user is 288 cores, but in the QoS configuration they 
should be allowed more.  If I run:

sacctmgr show user WithAssoc format=user%12,GrpTRES

..then I get:

    auser      cpu=512

What am I missing?  Why is 'auser' not being allowed to use all 512 of their 
allowed CPUs before the QOS limit is kicking in?

Thanks for any help you can offer.

Simon.

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered 
Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If 
you received this in error, please contact the sender and delete this email 
from your system. The contents of this e-mail are the views of the sender and 
do not necessarily represent the views of the Babraham Institute. Full 
conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>

Reply via email to