Am 31.08.2010 um 16:51 schrieb Rahul Nabar: > My scheduler, Torque flags compute-nodes as "busy" when the load gets > above a threshold "ideal load". My settings on 8-core compute nodes > have this ideal_load set to 8 but I am wondering if this is > appropriate or not? > > $max_load 9.0 > $ideal_load 8.0 > > I do understand the"ideal load = # of cores" heuristic but in at least
Yep. > 30% of our jobs ( if not more ) I find the load average greater than > 8. Sometimes even in the 9-10 range. But does this mean there is > something wrong or do I take this to be the "happy" scenario for HPC: > i.e. not only are all CPU's busy but the pipeline of processes waiting > for their CPU slice is also relatively full. After all, a > "under-loaded" HPC node is a waste of an expensive resource? With recent kernels also (kernel) processes in D state count as running. Hence the load appears higher than the running processes would imply when only these are added up. -- Reuti > On the other hand, if there truly were something wrong with a node[*] > and I was to use a high load avearage as one of the signs of > impending trouble what would be a good threshold? Above what > load-average on a compute node do people get actually worried? It > makes sense to set PBS's default "busy" warning to that limit instead > of just "8". > > I'm ignoring the 5/10/15 min load average distinction. I'm assuming > Torque is using the most appropriate one! > > *e.g. runaway process, infinite loop in user code, multiple jobs > accidentally assigned to some node etc. > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf