> > latency difference here matters to many codes). Perhaps of more > > significance, though, is that you can use oversubscription to lower > > the cost of your fabric. Instead of connecting 12 ports of a leaf > > switch to nodes and using the other 12 ports as uplinks, > you might get > > away with > > 18 nodes and 6 uplinks, or 20 nodes and 4 uplinks. As core > counts are > > increasing, this is becoming more and more viable for some > applications. > > It's important to note that the "full-bisection" touted by > vendors is on paper only. In reality, static routing provides > full-bisection for a very small subset of patterns, the > average effective bisection on a > diameter-3 Clos is ~40% of link rate (adaptive routing > improves that a lot, but breaks packet order on the wire > which is a requirement for some network protocols). >
Static routing is the best approach if your pattern is known. In other cases it depends on the applications. LANL and Mellanox have presented a paper on static routing and how to get the maximum of it last ISC. There are cases where adaptive routing will show a benefit, and this is why we see the IB vendors add adaptive routing support as well. But in general, the average effective bandwidth is much much higher than the 40% you claim. > In practice, "paper" full-bisection is near free when using a > single enclosure, since all spine cables are on the > backplane. For larger networks, where you have to pay for > real cables to the spine level, then it may make sense to be > oversubscribed if the effective bisection is already bad > (static routing), or if your collective communication on > large jobs are not bandwidth bounded. However, the later is > often false on many-cores. There are some vendors that uses only the 24 port switches to build very large scale clusters - 3000 nodes and above, without any oversubscription, and they find it more cost effective. Using single enclosures is easier, but the cables are not expensive and you can use the smaller components. I used the 24 ports switches to connect my 96 node cluster. I will replace my current setup with the new 36 InfiniBand port switches this month, since they provide lower latency and adaptive routing capabilities. And if you are bandwidth bounded, using IB QDR will help. You will be able to drive more than 3GB/s from each server. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf