3. interconnect bound.

with ethernet, this is obvious, since you can just look at user/system/idle
times.

You mean the system time will be high if nodes are busy sending/receiving?

well, if the node is compute-bound, nearly all time will be user-time.
if interconnect-bound, much time will be system or idle.  if system time
dominates, then cpu or memory is too slow.  if there is idle time, you
bottleneck is probably latency (perhaps network, but possibly also of whoever you're communicating with - compute node or fileserver.)

4. headnode bound.

do you mean for NFS traffic?

More in terms of managing the responses from the compute nodes.

just job start/completes? that's normally pretty trivial, though some queueing systems make a complete hash of it...

What would you see in a tcpdump of the network was the bottleneck, lots of resends?

if the net is a bandwidth bottleneck, then you'd see lots of back-to-back
packets, adding up to near wire-speed.  if latency is the issue, you'll see
relatively long delays between request and response (in NFS, for instance).
my real point is simply that tcpdump allows you to see the unadorned truth
about what's going on. obviously, tcpdump will let you see the rate and scale of your flows, and between which nodes...

anything which doesn't speed up going from gigabit to IB/10G/quadrics is what I would call embarassingly parallel...

True - I guess I'm trying to do some cost/benefit analysis so the magnitude of the improvement is important to me .. but maybe measuring it on a test cluster is the only way to be sure of this one.

well, maybe. it's a bit jump from 1x Gb to IB or 10GE - I wish it were easier to advocate Myri 2G as an intermediate step, since I actually don't
see a lot of apps showing signs of dissatisfaction with ~250 MB/s
interconnect - and IB/10GE don't have much advantage, if any, in latency.

Not in a while - I did some testing early on when I was testing different compilers but I don't think I did any specific MPI testing. What would you recommend - pallas or hpl? Or something else? Whats a good one that has other good publicly available reference data?

http://www.sharcnet.ca/~hahn/m-g.C is a benchmark I'm working on. it's mainly set up to just probe bw and latency for every pair of nodes in a cluster (obviously diagnostic). I have some simple scripts to turn the results into some decent images. it's obviously a work in progress, but has some nice properties. I'm thinking of collecting at least a low-res histogram for each measure, rather than just min/avg/max, since the lat/bw distibutions might be quite interesting.

Interestingly enough - I enabled this on Friday and the first model we tested with showed a 2-3% performance improvement in some quick testing. We tested it with another model which is uses a larger test set over the weekend and it showed a 30% improvement. So that's good news, but it's still not entirely obvious why we're seeing such a huge improvement when the network utilisation doesn't indicate that the switch is saturated - but I guess latency could be a big factor here.

I'm guessing you're simply bandwidth-limited, though it's unclear whether this is a simple bottleneck at the server, or affects "basal" intra-node communication as well.

I don't think you mentioned what your network looks like - all into one switch? what kind is it? have you verified that all the links are at 1000/fullduplex?

All the nodes are Tyan s2891 boards with onboard Broadcom bcm5704 integrated nics. They are all connected to a single hp procurve 3400cl 24-port switch. And I've verified that all ports are running at 1000/full (the switch is

I think that's a reasonably good switch.  one interesting thing about it
is that it supports up to 2 10G ports. if it turns out that your nodes are frequently waiting on your server, adding a 1G module, XFP and NIC might be a very nice tune-up. that assumes that the server can _do_ something at much greater than 1x Gb speeds, of course!

regards, mark hahn.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to