Good point which makes perfect sense to me.
Given that the theoretical maximum is actually 21.3 GB/s
the real maximum Triad number must be 21.3/3 = 7.1 GB/s.

I don't get this - triad does two reads and one write.
if you don't use store-through ('nt' versions of mov),
then the write also implies a read for write-allocate (filling the cache line).

without store-through, the peak theoretical number reported by stream should be 3*peak/4. the 4 is because there are 3r+1w,
and the 3 because stream doesn't give credit for write-allocate.

Then how do you explain a dual opteron with two 6.4GB/sec (peak)
memory system, 12.8GB/sec total per node managing 9-10GB/sec?

12.8/3=4.26GB/sec.  People are seeing well over twice that.

since pathscale does write-through, the peak really should be 12.8,
so achieving 9-10 is decent but not paradoxical. (the peak would correspond to 1.07 Gflops, significantly below the peak theoretical
pipeline rate of 2*clock flops...)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to