For 32 processes (4 process per node), the arrays with 512-Byte size are
communicated slower than the 4096-Byte size arrays. For both of them, we
do you mean that this is not the case in other configurations?
an interconnect _should_ have some steep rise in effective bandwidth
as packet size is increased. it's a useful metric to know the packet
size at which half-peak bandwidth is achieved, since this offers some
"sense of scale" to programmers judging whether their own packet sizes
are appropriate.
this abnormal case is persistent. More specifically, communication of
4k-Byte packages are 2 times faster than the communication of 512-Byte
packages.
perhaps I'm dense this morning, but what's unexpected about that?
The OSU bandwidth and latency test around these points shows:
Byte MB/s
256 417.53
512 592.34
1024 691.02
2048 857.35
4096 906.04
8192 1022.52
the osu_bw test is a streaming, fire-and-forget one which strongly
rewards message aggregation. (this is not necessarily deceptive -
it's measuring a real communication pattern, though it's not the
only way to quantify bandwidth.) you can see that it's aggregating
because the reported bandwidth for small packets is much higher than
you'd expect if each packet took the latency reported below.
(unless my math is wrong, 256/(2*4.79e-6) = 26.7 MB/s)
Time (usec)
256 4.79
512 5.48
1024 6.60
2048 8.30
4096 11.02
So this behavior does not seem reasonable to us.
2. SOMETIMES, after the test with overall 32 processes, one of the four
processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the test
program shows a "done." and waits for sometime. We can neither kill the
process nor soft reboot the node. We have to wait for that process to
terminate, which can last long.
does /proc/$pid/wchan (on the 'D' state process) tell you anything?
do all the ranks return from MPI_Finalize?
regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf