Steve Kargl wrote:
On Mon, Oct 12, 2009 at 03:35:15PM +1100, Alex R wrote:
Steve Kargl wrote:
On Mon, Oct 12, 2009 at 01:49:27PM +1100, Alex R wrote:
Steve Kargl wrote:
So, you have 4 cpus and 4 folding-at-home processes and you're
trying to use the system with other apps? Switch to 4BSD.
I thought SCHED_ULE was meant to be a much better choice under an SMP
environment. Why are you suggesting he rebuild his kernel and use the
legacy scheduler?
If you have N cpus and N+1 numerical intensitive applications,
ULE may have poor performance compared to 4BSD. In OP's case,
he has 4 cpus and 4 numerical intensity (?) applications. He,
however, also is trying to use the system in some interactive
way.
Ah ok. Is this just an accepted thing by the freebsd dev's or are they
trying to fix it?
Jeff appears to be extremely busy with other projects. He is aware of
the problem, and I have set up my system to give him access when/if it
is so desired.
Here's the text of my last set of tests that I sent to him
OK, I've manage to recreate the problem. User kargl launches a mpi
job on node10 that creates two images on node20. This is command z
in the top(1) info. 30 seconds later, user sgk lauches a mpi process
on node10 that creates 8 images on node20. This is command rivmp in
top(1) info. With 8 available cpus, this is a (slightly) oversubscribed
node.
For 4BSD, I see
last pid: 1432; load averages: 8.68, 5.65, 2.82 up
0+01:52:14 17:07:22
40 processes: 11 running, 29 sleeping
CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Mem: 32M Active, 12M Inact, 203M Wired, 424K Cache, 29M Buf, 31G Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1428 sgk 1 124 0 81788K 5848K CPU3 6 1:13 78.81% rivmp
1431 sgk 1 124 0 81788K 5652K RUN 1 1:13 78.52% rivmp
1415 kargl 1 124 0 78780K 4668K CPU7 1 1:38 78.42% z
1414 kargl 1 124 0 78780K 4664K CPU0 0 1:37 77.25% z
1427 sgk 1 124 0 81788K 5852K CPU4 3 1:13 78.42% rivmp
1432 sgk 1 124 0 81788K 5652K CPU2 4 1:13 78.27% rivmp
1425 sgk 1 124 0 81788K 6004K CPU5 5 1:12 78.17% rivmp
1426 sgk 1 124 0 81788K 5832K RUN 6 1:13 78.03% rivmp
1429 sgk 1 124 0 81788K 5788K CPU6 7 1:12 77.98% rivmp
1430 sgk 1 124 0 81788K 5764K RUN 2 1:13 77.93% rivmp
Notice, the accumulated times appear reasonable. At this point in the
computations, rivmp is doing no communication between processes. z is
the netpipe benchmark and is essentially sending messages between the
two processes over the memory bus.
For ULE, I see
last pid: 1169; load averages: 7.56, 2.61, 1.02 up
0+00:03:15 17:13:01
40 processes: 11 running, 29 sleeping
CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Mem: 31M Active, 9392K Inact, 197M Wired, 248K Cache, 26M Buf, 31G Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1168 sgk 1 118 0 81788K 5472K CPU6 6 1:18 100.00% rivmp
1169 sgk 1 118 0 81788K 5416K CPU7 7 1:18 100.00% rivmp
1167 sgk 1 118 0 81788K 5496K CPU5 5 1:18 100.00% rivmp
1166 sgk 1 118 0 81788K 5564K RUN 4 1:18 100.00% rivmp
1151 kargl 1 118 0 78780K 4464K CPU3 3 1:48 99.27% z
1152 kargl 1 110 0 78780K 4464K CPU0 0 1:18 62.89% z
1164 sgk 1 113 0 81788K 5592K CPU1 1 0:55 80.76% rivmp
1165 sgk 1 110 0 81788K 5544K RUN 0 0:52 62.16% rivmp
1163 sgk 1 107 0 81788K 5624K RUN 2 0:40 50.68% rivmp
1162 sgk 1 107 0 81788K 5824K CPU2 2 0:39 50.49% rivmp
In the above, processes 1162-1165 are clearly not receiving sufficient time
slices to keep up with the other 4 rivmp images. From watching top at a
1 second interval, once the 4 rivmp hit 100% CPU, they stayed pinned to
their cpu and stay at 100% CPU. It is also seen that processes 1152, 1165
and 1162, 1163 are stuck on cpus 0 and 2, respectively.
This isn't only bound to floating-point intense applications, even the
operating system itselfs seems to suffer from SCHED_ULE. I saw, see and
reported several performance issue under heavy load and for seconds (if
not minutes!) 4+ CPU boxes get as stuck as a UP box does. Those sticky
sitiuations are painful in cases where the box needs to be accessed via
X11.
The remaining four FreeBSD 8.0-boxes used for numerical applications in
our lab (others switched to Linux a long time ago) all uses SCHED_ULE,
as this scheduler was introduced to be the superior scheduler over the
legacy 4BSD. Well, I'll give 4BSD a chance again.
At the moment, even our 8-core DELL Poweredge box is in production use,
but if there is something I can do, menas: benchmarking, I'll give it a try.
Regards,
Oliver
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[email protected]"