Interesting... Given that Add and Triad are virtually the same it's surprising that Copy and Scale are so different. IMHO Scale should be more like Copy. Compiler effect?
> here you go (dell 2950 with 8 modules and streams compiled with icc-9.1 -O3: > > [EMAIL PROTECTED] streamd]# hostname ; date ; for i in 1 2 3 4 5 ; > do export > OMP_NUM_THREADS=$i ; ./streamd | egrep "Total memory > re|Number of Th|Function > |Copy:|Scale:|Add:|Triad:"; done > tbox3 > Fri Aug 11 17:59:22 CEST 2006 > Total memory required = 457.8 MB. > Number of Threads requested = 1 > Function Rate (MB/s) Avg time Min time Max time > Copy: 3945.5494 0.0812 0.0811 0.0813 > Scale: 2914.9758 0.1098 0.1098 0.1099 > Add: 3227.5618 0.1488 0.1487 0.1489 > Triad: 3219.5307 0.1492 0.1491 0.1493 > Total memory required = 457.8 MB. > Number of Threads requested = 2 > Function Rate (MB/s) Avg time Min time Max time > Copy: 4324.2058 0.0741 0.0740 0.0742 > Scale: 2999.9626 0.1068 0.1067 0.1069 > Add: 3309.2733 0.1451 0.1450 0.1452 > Triad: 3309.7031 0.1451 0.1450 0.1452 > Total memory required = 457.8 MB. > Number of Threads requested = 3 > Function Rate (MB/s) Avg time Min time Max time > Copy: 5422.5441 0.0590 0.0590 0.0590 > Scale: 4102.8364 0.0780 0.0780 0.0781 > Add: 4487.2464 0.1070 0.1070 0.1070 > Triad: 4487.7465 0.1070 0.1070 0.1070 > Total memory required = 457.8 MB. > Number of Threads requested = 4 > Function Rate (MB/s) Avg time Min time Max time > Copy: 6023.2969 0.0532 0.0531 0.0533 > Scale: 4862.4855 0.0658 0.0658 0.0659 > Add: 5264.1973 0.0912 0.0912 0.0913 > Triad: 5268.1782 0.0911 0.0911 0.0911 > Total memory required = 457.8 MB. > Number of Threads requested = 5 > Function Rate (MB/s) Avg time Min time Max time > Copy: 5504.9004 0.0582 0.0581 0.0582 > Scale: 4318.9044 0.0786 0.0741 0.1147 > Add: 4705.1016 0.1042 0.1020 0.1216 > Triad: 4705.2885 0.1038 0.1020 0.1184 > > > Two cores on separate sockets should show higher numbers if it's > > an L2 cache issue. If they are the same as those for 2 cores on one > > socket then you have a problem with the North bridge or getting > > full bandwidth from the FB-DIMMs. > > > > A complication in this test could be that in the one core > per socket case > > the whole L2 cache is allocated to a single core. Watching > performance > > change as the array sizes grow should reveal this. > > > > rbw > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf