Vincent Diepeveen wrote: > that simple C program that measures latency, > can you try it with a more realistic working set size also > to measure RAM latency, so with like 2GB in total or so?
I think it measures RAM latency quite well, but doesn't exercise the TLB as hard as a 2GB dataset would. 8 Thread randomly accessing 2GB is a TLB nightmare. I do not believe the kernel I'm using has the 1GB pages available on the barcelona chips. In any case, sure I'll run 2GB numbers. Opteron 2350 (2.0 GHz): pathcc -O4 -mp stream.c -o stream Total memory required = 2014.2 MB. Function Rate (MB/s) Avg time Min time Max time Copy: 15328.3395 0.0921 0.0919 0.0922 Scale: 15297.8845 0.0921 0.0920 0.0922 Add: 14787.7337 0.1432 0.1428 0.1437 Triad: 15067.3052 0.1403 0.1402 0.1404 ------------------------------------------------------------- Solution Validates gcc -c -O4 -Wall -pedantic plat.c gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma Each thread accesses 67108864 INTs in a 256 MB array. With 1 thread(s), max latency was 9.174 seconds, effective latency=136.70 ns. With 2 thread(s), max latency was 9.186 seconds, effective latency=68.44 ns. With 4 thread(s), max latency was 9.763 seconds, effective latency=36.37 ns. With 8 thread(s), max latency was 10.589 seconds, effective latency=19.72 ns. Opteron 275 (2.2 GHz): pathcc -O4 -mp stream.c -o stream Total memory required = 2014.2 MB. Function Rate (MB/s) Avg time Min time Max time Copy: 8607.2317 0.0189 0.0186 0.0215 Scale: 8637.8088 0.0186 0.0185 0.0186 Add: 8249.3994 0.0291 0.0291 0.0292 Triad: 8244.0621 0.0301 0.0291 0.0372 gcc -c -O4 -Wall -pedantic plat.c gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma Each thread accesses 67108864 INTs in a 256 MB array. With 1 thread(s), max latency was 7.737 seconds, effective latency=115.29 ns. With 2 thread(s), max latency was 7.722 seconds, effective latency=57.53 ns. With 4 thread(s), max latency was 16.174 seconds, effective latency=60.25 ns. Previously when the opteron DDR-2 systems were newish a fair number of people posted stream numbers for the opterons and intels of the time. My vague memory was that intel was in the 7-9GB/sec and the ddr-2 opterons were in the 12.5-13.0GB/sec range. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf