sorry, forgot to reply all... don't you hate gmail's interface sometimes?
What is the memory latency of the woodcrest machines? Since memory latency really determines your memory bandwidth. If Intel hasn't made any improvements in latency then the limited number of out-standing loads in the x86-64 architecture will limit the bandwidth regarless of the MB/s you throw at it. On 8/15/06, Richard Walsh <[EMAIL PROTECTED]> wrote:
Joe Landman wrote: >4-threads > >Copy: 6645.4125 0.0965 0.0963 0.0976 >Scale: 6994.6233 0.0916 0.0915 0.0917 >Add: 6373.0207 0.1508 0.1506 0.1509 >Triad: 6710.7522 0.1432 0.1431 0.1433 > >I may have been Bill's 10 GB/s source, and that may have been a mixup on my part. 10 GB/sec of course comes from the advertised bandwidth off a single socket. Yes, this is quite disappointing because the "on-paper" numbers from each socket to the Northbridge are nicely balanced with the 4-channel FB-DIMM numbers. Then there is all the discussion of the advantages of the shared L2 cache and the shared-cache-intelligent pre-fetch engines and cool memory dis-ambiguation. Seemingly irrelevant I guess, if the Northbridge is still under designed. Is it possible that the compilers are just not ready to effectively use some of these features ... ?? ... on the other hand stream is sufficiently simple that these features probably do not come into play anyway. The real application benchmarks with some quantity of locality look better. Any one working on compilers care to comment what's the bottleneck really is? rbw
-- Dr Stuart Midgley [EMAIL PROTECTED] _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf