Bill Broadley <[EMAIL PROTECTED]> wrote:
> Dual socket quad core opteron 2350's (2.0 GHz) running the current McCalpin'S > STREAM compiled with pathscale-3.0 -mp -O4: > Total memory required = 228.9 MB. > Function Rate (MB/s) Avg time Min time Max time > Copy: 15355.3139 0.0104 0.0104 0.0105 > Scale: 15249.5885 0.0105 0.0105 0.0105 > Add: 14954.2883 0.0161 0.0160 0.0162 > Triad: 15061.2389 0.0160 0.0159 0.0160 So with all 8 cores at work from 2 sockets you are seeing 70% of peak assuming you are using 667 MHz DDR2 (as fast as you can get until the "Phenom" comes out I think) which is a little better on a percentage basis than socket 940 numbers. That meets expectations. I am surprised by the latency number you provide though. Latencies in the 90 to 100+ nanos are quite a bit higher than I expected and are edging up into the Intel range. Perhaps this is an L3 cache delay effect -- a new layer in the path to memory in the Barcelona. Although I see your 200 series numbers are up there too ... I thought first byte latencies were around 65 nanos for Opteron. Am I confused? Anyway, if the latency numbers hold up, I would say this is not the greatest news for Barcelona. We can anticipate faster clocks which should help, but it makes you wonder what things would have looked like with a larger shared L2 cache instead of an L3. This is a synthetic test of course, what compilers and users do to strip mine for cache will present a more realistic assessment. Perhaps this was the trade off driving this design. Can I continue to think of the AMD as the first byte latency king? ... ;-) ... rbw -- "Making predictions is hard, especially about the future." Niels Bohr -- Richard Walsh Thrashing River Consulting-- 5605 Alameda St. Shoreview, MN 55126 Phone #: 612-382-4620 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf