Re: Scheduler weirdness

O. Hartmann Mon, 12 Oct 2009 00:45:44 -0700

Steve Kargl wrote:

On Mon, Oct 12, 2009 at 03:35:15PM +1100, Alex R wrote:

Steve Kargl wrote:
On Mon, Oct 12, 2009 at 01:49:27PM +1100, Alex R wrote:
Steve Kargl wrote:
So, you have 4 cpus and 4 folding-at-home processes and you're
trying to use the system with other apps?  Switch to 4BSD.
I thought SCHED_ULE was meant to be a much better choice under an SMPenvironment. Why are you suggesting he rebuild his kernel and use thelegacy scheduler?
If you have N cpus and N+1 numerical intensitive applications,
ULE may have poor performance compared to 4BSD.   In OP's case,
he has 4 cpus and 4 numerical intensity (?) applications.  He,
however, also is trying to use the system in some interactive
way.
Ah ok. Is this just an accepted thing by the freebsd dev's or are theytrying to fix it?


Jeff appears to be extremely busy with other projects.  He is aware of
the problem, and I have set up my system to give him access when/if it
is so desired.

Here's the text of my last set of tests that I sent to him

OK, I've manage to recreate the problem.  User kargl launches a mpi
job on node10 that creates two images on node20.  This is command z
in the top(1) info.  30 seconds later, user sgk lauches a mpi process
on node10 that creates 8 images on node20.  This is command rivmp in
top(1) info.  With 8 available cpus, this is a (slightly) oversubscribed
node.

For 4BSD, I see

last pid:  1432;  load averages:  8.68,  5.65,  2.82                up 
0+01:52:14  17:07:22
40 processes:  11 running, 29 sleeping
CPU:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
Mem: 32M Active, 12M Inact, 203M Wired, 424K Cache, 29M Buf, 31G Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
 1428 sgk           1 124    0 81788K  5848K CPU3    6   1:13 78.81% rivmp
 1431 sgk           1 124    0 81788K  5652K RUN     1   1:13 78.52% rivmp
 1415 kargl         1 124    0 78780K  4668K CPU7    1   1:38 78.42% z
 1414 kargl         1 124    0 78780K  4664K CPU0    0   1:37 77.25% z
 1427 sgk           1 124    0 81788K  5852K CPU4    3   1:13 78.42% rivmp
 1432 sgk           1 124    0 81788K  5652K CPU2    4   1:13 78.27% rivmp
 1425 sgk           1 124    0 81788K  6004K CPU5    5   1:12 78.17% rivmp
 1426 sgk           1 124    0 81788K  5832K RUN     6   1:13 78.03% rivmp
 1429 sgk           1 124    0 81788K  5788K CPU6    7   1:12 77.98% rivmp
 1430 sgk           1 124    0 81788K  5764K RUN     2   1:13 77.93% rivmp


Notice, the accumulated times appear reasonable.  At this point in the
computations, rivmp is doing no communication between processes.  z is
the netpipe benchmark and is essentially sending messages between the
two processes over the memory bus.


For ULE, I see

last pid:  1169;  load averages:  7.56,  2.61,  1.02                up 
0+00:03:15  17:13:01
40 processes:  11 running, 29 sleeping
CPU:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
Mem: 31M Active, 9392K Inact, 197M Wired, 248K Cache, 26M Buf, 31G Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
 1168 sgk           1 118    0 81788K  5472K CPU6    6   1:18 100.00% rivmp
 1169 sgk           1 118    0 81788K  5416K CPU7    7   1:18 100.00% rivmp
 1167 sgk           1 118    0 81788K  5496K CPU5    5   1:18 100.00% rivmp
 1166 sgk           1 118    0 81788K  5564K RUN     4   1:18 100.00% rivmp
 1151 kargl         1 118    0 78780K  4464K CPU3    3   1:48 99.27% z
 1152 kargl         1 110    0 78780K  4464K CPU0    0   1:18 62.89% z
 1164 sgk           1 113    0 81788K  5592K CPU1    1   0:55 80.76% rivmp
 1165 sgk           1 110    0 81788K  5544K RUN     0   0:52 62.16% rivmp
 1163 sgk           1 107    0 81788K  5624K RUN     2   0:40 50.68% rivmp
 1162 sgk           1 107    0 81788K  5824K CPU2    2   0:39 50.49% rivmp


In the above, processes 1162-1165 are clearly not receiving sufficient time
slices to keep up with the other 4 rivmp images.  From watching top at a
1 second interval, once the 4 rivmp hit 100% CPU, they stayed pinned to
their cpu and stay at 100% CPU.  It is also seen that processes 1152, 1165
and 1162, 1163 are stuck on cpus 0 and 2, respectively.

This isn't only bound to floating-point intense applications, even theoperating system itselfs seems to suffer from SCHED_ULE. I saw, see andreported several performance issue under heavy load and for seconds (ifnot minutes!) 4+ CPU boxes get as stuck as a UP box does. Those stickysitiuations are painful in cases where the box needs to be accessed viaX11.The remaining four FreeBSD 8.0-boxes used for numerical applications inour lab (others switched to Linux a long time ago) all uses SCHED_ULE,as this scheduler was introduced to be the superior scheduler over thelegacy 4BSD. Well, I'll give 4BSD a chance again.

At the moment, even our 8-core DELL Poweredge box is in production use,but if there is something I can do, menas: benchmarking, I'll give it a try.

Regards,
Oliver
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[email protected]"

Re: Scheduler weirdness

Reply via email to