Finally the NDAs have expired and there's tons of technical info available on many hardware websites. I figured I'd post some more numbers. In general I'm impressed with the bandwidth (considering it uses the same dimms), and with the parallelism in the memory system.
Dual socket quad core opteron 2350's (2.0 GHz) running the current McCalpin'S STREAM compiled with pathscale-3.0 -mp -O4: Total memory required = 228.9 MB. Function Rate (MB/s) Avg time Min time Max time Copy: 15355.3139 0.0104 0.0104 0.0105 Scale: 15249.5885 0.0105 0.0105 0.0105 Add: 14954.2883 0.0161 0.0160 0.0162 Triad: 15061.2389 0.0160 0.0159 0.0160 ------------------------------------------------------------- Solution Validates (no source changes except for the N= line). I latency benchmark I wrote that for for each thread accesses 32 MB randomly via: while (p != 0) { p = a[p]; } thread=0, 7.293 seconds, latency=108.48 ns With 1 thread(s), max latency was 7.293 seconds, effective latency=108.48 ns. thread=0, 7.300 seconds, latency=108.59 ns thread=1, 7.290 seconds, latency=108.44 ns With 2 thread(s), max latency was 7.300 seconds, effective latency=53.84 ns. thread=0, 7.418 seconds, latency=110.35 ns thread=1, 7.363 seconds, latency=109.53 ns thread=2, 7.422 seconds, latency=110.41 ns thread=3, 7.359 seconds, latency=109.47 ns With 4 thread(s), max latency was 7.422 seconds, effective latency=26.91 ns. thread=0, 7.417 seconds, latency=110.33 ns thread=1, 7.394 seconds, latency=109.98 ns thread=2, 7.433 seconds, latency=110.57 ns thread=3, 7.382 seconds, latency=109.81 ns thread=4, 7.448 seconds, latency=110.79 ns thread=5, 7.379 seconds, latency=109.76 ns thread=6, 7.443 seconds, latency=110.71 ns thread=7, 7.411 seconds, latency=110.24 ns With 8 thread(s), max latency was 7.448 seconds, effective latency=13.03 ns. For comparison an opteron 270 (2.0 GHz as well): thread=0, 6.301 seconds, latency=93.72 ns With 1 thread(s), max latency was 6.301 seconds, effective latency=93.72 ns. thread=0, 6.302 seconds, latency=93.73 ns thread=1, 6.276 seconds, latency=93.35 ns With 2 thread(s), max latency was 6.302 seconds, effective latency=46.47 ns. thread=0, 12.071 seconds, latency=179.55 ns thread=1, 12.084 seconds, latency=179.75 ns thread=2, 11.952 seconds, latency=177.79 ns thread=3, 12.055 seconds, latency=179.31 ns With 4 thread(s), max latency was 12.488 seconds, effective latency=45.27 ns. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf