Flextronics was showing a small cluster where they had 4 boxes connected by IB and within each box they had 4 systems connected by IB. They were running ScaleMP on it. They had a graph of running Stream on top of the system. They were plotting bandwidth vs. number of cores and it was fairly linear (I didn't get a close look at it).
but stream is embarassingly parallel, so even if their interconnect was wet string, it should scale perfectly with number of nodes. (well, start and end-of-loop synchronization probably doesn't work well with wet string, but that just means you crank up the array size ;)
does anyone know how the coherency actually works? without a full-fledged memory proxy (as SGI has in their NUMAlink machines, or as in the Newisys Horus), it seems like this approach is going to spend a lot of time twiddling the MMU and taking page faults. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf