Flextronics was showing a small cluster where they had 4 boxes connected
by IB and within each box they had 4 systems connected by IB. They were
running ScaleMP on it. They had a graph of running Stream on top of the
system. They were plotting bandwidth vs. number of cores and it was fairly
linear (I didn't get a close look at it).

but stream is embarassingly parallel, so even if their interconnect was wet string, it should scale perfectly with number of nodes. (well, start and end-of-loop synchronization probably doesn't work well with wet string, but that just means you crank up the array size ;)

does anyone know how the coherency actually works?  without a full-fledged
memory proxy (as SGI has in their NUMAlink machines, or as in the Newisys
Horus), it seems like this approach is going to spend a lot of time twiddling
the MMU and taking page faults.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to