Gilad Shainer wrote:
Static routing is the best approach if your pattern is known. In other

If your pattern is known, and if it is persistent, and it is perfectly synchronized, and if you have a single job running on the fabric, and if you have total control of the process/node mapping and if there is no down/bad links, and if there is no other traffic pattern in the application, then yes static routing is the best.

In real life, where there are multiple jobs running at once on various parts of a cluster, where there are always some marginal links, when you cannot guarantee on which nodes a job will be allocated, and applications have multiple communication patterns (collectives) and load is usually unbalanced, static routing is the worst.

cases it depends on the applications. LANL and Mellanox have presented a
paper on static routing and how to get the maximum of it last ISC. There

Single app, dedicated machine, total control of the network. Similarly, I could have a pretty good shot at predicting the next lotto numbers if I would know the position (and speed) of all atoms in the universe (Dr Brown, this is for you !).

are cases where adaptive routing will show a benefit, and this is why we
see the IB vendors add adaptive routing support as well. But in general,
the average effective bandwidth is much much higher than the 40% you
claim.

Have a look at the slides 17 and 19 of the following set of slides (and slides 21 and 22 to illustrate my point above):
http://www.openib.org/archives/spring2007sonoma/Monday%20April%2030/Leininger-Seager-Adaptive-Routing-OFA-Sonoma-2007-v03.pdf

Hoefler and al have shown an average effective bisection of ~40% on Infiniband (OMNeT simulations) in a paper submitted to Cluster2008. In a paper to be presented at Hot Interconnects this year, I have measured the effective bisection (SendRecv on random pairs) on a 512-node Myri-10G cluster (single enclosure, 32-port crossbars) under various routing implementations. Below is the link to pretty graphs with static and probing adaptive routing:
http://patrick.geoffray.googlepages.com/staticvsadaptiverouting

You can see that the worst case static routing goes quickly below 40%, but the average eventually goes there as well.

There are some vendors that uses only the 24 port switches to build very
large scale clusters - 3000 nodes and above, without any
oversubscription, and they find it more cost effective. Using single
enclosures is easier, but the cables are not expensive and you can use

Price of cables usually depends on the length (copper and fiber). Using small switches at the edges allows to use very short cables to the hosts (in-rack) but you still have to use the same number of longer cables to connect to the spine. With a single enclosure, you may need longer cables to reach the hosts (different rack), but you don't need cables to the spine as they are on the switch backplane (and PCB is free). Short cables may not be expensive, but they are not free. Furthermore, physical cables are much less reliable than wire on PCB, and they take more space, more power.

Patrick
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to