How do your Rate numbers correlate to the max bandwitdh of 32GB/s
(http://en.wikipedia.org/wiki/GeForce_8_Series)?

good point.  I had assumed the quoted numbers were merely in-cache,
but it does claim to be running on array size 2e6 (8e6 bytes),
which seems a bit large for in-cache.  (though very small for a Stream run).

http://forums.nvidia.com/index.php?showtopic=52686

this quotes a plausible 64-65 GB/s on a C870 (76.8 peak theoretical).

Running this on my 8600 card I get:

STREAM Benchmark implementation in CUDA
 Array size (single precision)=2000000
 using 128 threads per block, 15625 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:      291777.6696       0.0001       0.0001       0.0001
Scale:     291777.6696       0.0001       0.0001       0.0001
Add:       437666.5043       0.0001       0.0001       0.0001
Triad:     437666.5043       0.0001       0.0001       0.0001

this is implausible.  my guess is the timing code is broken.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to