sorry, forgot to reply all... don't you hate gmail's interface sometimes?


What is the memory latency of the woodcrest machines?  Since memory
latency really determines your memory bandwidth.

If Intel hasn't made any improvements in latency then the limited
number of out-standing loads in the x86-64 architecture will limit the
bandwidth regarless of the MB/s you throw at it.



On 8/15/06, Richard Walsh <[EMAIL PROTECTED]> wrote:
Joe Landman wrote:

 >4-threads
 >
 >Copy:        6645.4125       0.0965       0.0963       0.0976
 >Scale:       6994.6233       0.0916       0.0915       0.0917
 >Add:         6373.0207       0.1508       0.1506       0.1509
 >Triad:       6710.7522       0.1432       0.1431       0.1433
 >
 >I may have been Bill's 10 GB/s source, and that may have been a mixup
on my part.

10 GB/sec of course comes from the advertised bandwidth off a single socket.

Yes, this is quite disappointing because the "on-paper" numbers from each
socket to the Northbridge are nicely balanced with the 4-channel FB-DIMM
numbers.  Then there is all the discussion of the advantages of the
shared L2 cache
and the shared-cache-intelligent pre-fetch engines and cool memory
dis-ambiguation.
Seemingly irrelevant I guess, if the Northbridge is still under designed.

Is it possible that the compilers are just not ready to effectively use
some of these
features ... ?? ... on the other hand stream is sufficiently simple that
these features
probably do not come into play anyway.  The real application benchmarks
with
some quantity of locality look better.

Any one working on compilers care to comment what's the bottleneck
really is?

rbw



--
Dr Stuart Midgley
[EMAIL PROTECTED]
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to