Re: [Beowulf] Performance characterising a HPC application

Mark Hahn Tue, 20 Mar 2007 20:24:16 -0800

3. interconnect bound.


with ethernet, this is obvious, since you can just look at user/system/idle
times.


You mean the system time will be high if nodes are busy sending/receiving?


well, if the node is compute-bound, nearly all time will be user-time.
if interconnect-bound, much time will be system or idle.  if system time
dominates, then cpu or memory is too slow.  if there is idle time, you

bottleneck is probably latency (perhaps network, but possibly also ofwhoever you're communicating with - compute node or fileserver.)

4. headnode bound.


do you mean for NFS traffic?


More in terms of managing the responses from the compute nodes.

just job start/completes? that's normally pretty trivial, though somequeueing systems make a complete hash of it...

What would you see in a tcpdump of the network was the bottleneck, lots ofresends?


if the net is a bandwidth bottleneck, then you'd see lots of back-to-back
packets, adding up to near wire-speed.  if latency is the issue, you'll see
relatively long delays between request and response (in NFS, for instance).
my real point is simply that tcpdump allows you to see the unadorned truth

about what's going on. obviously, tcpdump will let you see the rate andscale of your flows, and between which nodes...

anything which doesn't speed up going from gigabit to IB/10G/quadrics iswhat I would call embarassingly parallel...
True - I guess I'm trying to do some cost/benefit analysis so the magnitudeof the improvement is important to me .. but maybe measuring it on a testcluster is the only way to be sure of this one.

well, maybe. it's a bit jump from 1x Gb to IB or 10GE - I wish it wereeasier to advocate Myri 2G as an intermediate step, since I actually don't

see a lot of apps showing signs of dissatisfaction with ~250 MB/s
interconnect - and IB/10GE don't have much advantage, if any, in latency.

Not in a while - I did some testing early on when I was testing differentcompilers but I don't think I did any specific MPI testing. What would yourecommend - pallas or hpl? Or something else? Whats a good one that has othergood publicly available reference data?

http://www.sharcnet.ca/~hahn/m-g.C is a benchmark I'm working on. it'smainly set up to just probe bw and latency for every pair of nodes in acluster (obviously diagnostic). I have some simple scripts to turn theresults into some decent images. it's obviously a work in progress, buthas some nice properties. I'm thinking of collecting at least a low-reshistogram for each measure, rather than just min/avg/max, since thelat/bw distibutions might be quite interesting.

Interestingly enough - I enabled this on Friday and the first model we testedwith showed a 2-3% performance improvement in some quick testing. We testedit with another model which is uses a larger test set over the weekend and itshowed a 30% improvement. So that's good news, but it's still not entirelyobvious why we're seeing such a huge improvement when the network utilisationdoesn't indicate that the switch is saturated - but I guess latency could bea big factor here.

I'm guessing you're simply bandwidth-limited, though it's unclear whetherthis is a simple bottleneck at the server, or affects "basal" intra-nodecommunication as well.

I don't think you mentioned what your network looks like - all into oneswitch? what kind is it? have you verified that all the links are at1000/fullduplex?
All the nodes are Tyan s2891 boards with onboard Broadcom bcm5704 integratednics. They are all connected to a single hp procurve 3400cl 24-port switch.And I've verified that all ports are running at 1000/full (the switch is


I think that's a reasonably good switch.  one interesting thing about it

is that it supports up to 2 10G ports. if it turns out that your nodesare frequently waiting on your server, adding a 1G module, XFP and NICmight be a very nice tune-up. that assumes that the server can _do_something at much greater than 1x Gb speeds, of course!


regards, mark hahn.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Performance characterising a HPC application

Reply via email to