And today memory access can stall up to hundreds of cycles, so any
processor can hide this latency by switching to another thread.
My gosh ... we have re-invented the Tera MTA. ...
I think the reason we both know what that name means is that
they had (have?) a nugget of truth. after all, a multiplier
unit on a chip doesn't really care on which thread's behalf
it's doing work. MTA is perhaps a bit far towards the pure
gatling-gun approach, but I think we can all agree that ultimately
any program is just a big hairy dataflow graph.
But the you have to make sure the processor has enough cache and memory
bandwidth to handle the increased memory traffic (like Sun Niagara).
The problem with many (cores|threads) is that memory bandwidth wall. A fixed
size (B) pipe to memory, with N requesters on that pipe ...
I think that's why almost everyone agrees with the elegance of AMD's
system architecture - memory attached to and thus scaling with ncpus.
and yes, there's a lot of work already going on regarding making caches
more intelligent - predicting the multireference or sharing properties
of a cache block, for instance, to choose when to move it and between
which caches in a big system.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf