Re: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.

Reuti Tue, 31 Aug 2010 09:08:46 -0700

Am 31.08.2010 um 16:51 schrieb Rahul Nabar:

> My scheduler, Torque flags compute-nodes as "busy" when the load gets
> above a threshold "ideal load". My settings on 8-core compute nodes
> have this ideal_load set to 8 but I am wondering if this is
> appropriate or not?
> 
> $max_load 9.0
> $ideal_load 8.0
> 
> I do understand the"ideal load = # of cores" heuristic but in at least


Yep.


> 30% of our jobs ( if not more ) I find the load average greater than
> 8. Sometimes even in the 9-10 range. But does this mean there is
> something wrong or do I take this to be the "happy" scenario for HPC:
> i.e. not only are all CPU's busy but the pipeline of processes waiting
> for their CPU slice is also relatively full. After all, a
> "under-loaded" HPC node is a waste of an expensive resource?

With recent kernels also (kernel) processes in D state count as running. Hence 
the load appears higher than the running processes would imply when only these 
are added up.

-- Reuti


> On the other hand, if there truly were something wrong with a node[*]
> and I was to use a high load avearage  as one of the signs of
> impending trouble what would be a good threshold? Above what
> load-average on a compute node do people get actually worried? It
> makes sense to set PBS's default "busy" warning to that limit instead
> of just "8".
> 
> I'm ignoring the 5/10/15 min load average distinction. I'm assuming
> Torque is using the most appropriate one!
> 
> *e.g. runaway process, infinite loop in user code, multiple jobs
> accidentally assigned to some node etc.
> 
> -- 
> Rahul
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.

Reply via email to