And today memory access can stall up to hundreds of cycles, so any processor can hide this latency by switching to another thread.

My gosh ... we have re-invented the Tera MTA.  ...

I think the reason we both know what that name means is that they had (have?) a nugget of truth. after all, a multiplier unit on a chip doesn't really care on which thread's behalf it's doing work. MTA is perhaps a bit far towards the pure gatling-gun approach, but I think we can all agree that ultimately
any program is just a big hairy dataflow graph.

But the you have to make sure the processor has enough cache and memory bandwidth to handle the increased memory traffic (like Sun Niagara).

The problem with many (cores|threads) is that memory bandwidth wall. A fixed size (B) pipe to memory, with N requesters on that pipe ...

I think that's why almost everyone agrees with the elegance of AMD's system architecture - memory attached to and thus scaling with ncpus.
and yes, there's a lot of work already going on regarding making caches
more intelligent - predicting the multireference or sharing properties
of a cache block, for instance, to choose when to move it and between
which caches in a big system.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to