> On Behalf Of Joe Landman > > Since the part is released, I can report a stream test :)
And so can I :-) (below) > > richard.wa...@comcast.net wrote: > > > 64 GB/sec is the right dual-socket theoretical number for this > > situation, and Intel > > presents the value of 33 GB/sec for the stream triad for the dual > > socket boards, > > > > so 35 GB/sec could be a copy perhaps, but nothing was mentioned about > > any benchmark in the memory piece. The STREAM benchmark was mentioned in the delltechcenter piece, but which sub-benchmark (Triad or Copy, etc.) was not. Here's some results we got on a Nehalem system with Dual Intel Xeon W5580 @ 3.20GHz CPUs, 6x 2GB DDR3-1333 dimms (one per memory channel), and SMT turned off, where all 4 STREAM components are over 37 GB/s when run on 8 threads over two CPUs: ------------------ OpenMP (8 threads) Intel 11.0, icc -O3 -openmp -static Array size = 32000000, Offset = 0 ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 38705.2547 0.0134 0.0132 0.0135 Scale: 37735.3959 0.0137 0.0136 0.0138 Add: 37293.9249 0.0207 0.0206 0.0209 Triad: 37388.7235 0.0207 0.0205 0.0209 Serial Intel 11.0, icc -O3 -static ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 10781.6770 0.0475 0.0475 0.0475 Scale: 10080.7104 0.0508 0.0508 0.0508 Add: 12646.7882 0.0608 0.0607 0.0608 Triad: 12628.8395 0.0608 0.0608 0.0608 ------------------- The 3.2 GHz, W5580 part is for workstations. We'll remeasure when we get some servers with somewhat slower CPUs, but I would not expect a big difference from the above. -Tom Elken > In any case, I think we have the > > right theoretical > > > > and probable real-world numbers expressed here, if people were > > wondering. > > 2-socket Intel MB with 2 dual core (not quad core) Nehalem E5502 1.8 > GHz > processors, running stream omp (I bumped N way up to get a reasonable > measurement). > > land...@velocibunny:~/stream$ ./stream_c_omp.exe > ------------------------------------------------------------- > STREAM version $Revision: 5.8 $ > ------------------------------------------------------------- > This system uses 8 bytes per DOUBLE PRECISION word. > ------------------------------------------------------------- > Array size = 200000000, Offset = 0 > Total memory required = 4577.6 MB. > Each test is run 10 times, but only > the *best* time for each is used. > ------------------------------------------------------------- > Number of Threads requested = 4 > ------------------------------------------------------------- > Printing one line per active thread.... > Printing one line per active thread.... > Printing one line per active thread.... > Printing one line per active thread.... > ------------------------------------------------------------- > Your clock granularity/precision appears to be 1 microseconds. > Each test below will take on the order of 130623 microseconds. > (= 130623 clock ticks) > Increase the size of the arrays if this shows that > you are not getting at least 20 clock ticks per test. > ------------------------------------------------------------- > WARNING -- The above is only a rough guideline. > For best results, please be sure you know the > precision of your system timer. > ------------------------------------------------------------- > Function Rate (MB/s) Avg time Min time Max time > Copy: 16545.0680 0.1942 0.1934 0.1958 > Scale: 16098.2714 0.1996 0.1988 0.2019 > Add: 17929.8514 0.2684 0.2677 0.2697 > Triad: 17682.8117 0.2719 0.2715 0.2722 > ------------------------------------------------------------- > Solution Validates > ------------------------------------------------------------- > > and for laughs, same test run (with same binary) on Shanghai 2.3 GHz > (2376) with OMP_NUM_THREADS=4 > > > land...@pegasus-a3g:~/stream$ ./stream_c_omp.exe > ------------------------------------------------------------- > STREAM version $Revision: 5.8 $ > ------------------------------------------------------------- > This system uses 8 bytes per DOUBLE PRECISION word. > ------------------------------------------------------------- > Array size = 200000000, Offset = 0 > Total memory required = 4577.6 MB. > Each test is run 10 times, but only > the *best* time for each is used. > ------------------------------------------------------------- > Number of Threads requested = 4 > ------------------------------------------------------------- > Printing one line per active thread.... > Printing one line per active thread.... > Printing one line per active thread.... > Printing one line per active thread.... > ------------------------------------------------------------- > Your clock granularity/precision appears to be 1 microseconds. > Each test below will take on the order of 210029 microseconds. > (= 210029 clock ticks) > Increase the size of the arrays if this shows that > you are not getting at least 20 clock ticks per test. > ------------------------------------------------------------- > WARNING -- The above is only a rough guideline. > For best results, please be sure you know the > precision of your system timer. > ------------------------------------------------------------- > Function Rate (MB/s) Avg time Min time Max time > Copy: 10885.6547 0.2943 0.2940 0.2946 > Scale: 10966.1188 0.2923 0.2918 0.2929 > Add: 12019.7420 0.4002 0.3993 0.4012 > Triad: 12127.1875 0.3965 0.3958 0.3968 > ------------------------------------------------------------- > Solution Validates > ------------------------------------------------------------- > > I suspect we have the pegasus memory in a non-optimal config, will look > later on next week. > > Assuming we can get a pair of quad core Nehalem units into our test > machine, it appears that 32 GB/s on stream is quite possible. Right > now > it looks like ~4 GB/s per thread. > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: land...@scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf