How do your Rate numbers correlate to the max bandwitdh of 32GB/s (http://en.wikipedia.org/wiki/GeForce_8_Series)?
good point. I had assumed the quoted numbers were merely in-cache, but it does claim to be running on array size 2e6 (8e6 bytes), which seems a bit large for in-cache. (though very small for a Stream run).
http://forums.nvidia.com/index.php?showtopic=52686
this quotes a plausible 64-65 GB/s on a C870 (76.8 peak theoretical).
Running this on my 8600 card I get: STREAM Benchmark implementation in CUDA Array size (single precision)=2000000 using 128 threads per block, 15625 blocks Function Rate (MB/s) Avg time Min time Max time Copy: 291777.6696 0.0001 0.0001 0.0001 Scale: 291777.6696 0.0001 0.0001 0.0001 Add: 437666.5043 0.0001 0.0001 0.0001 Triad: 437666.5043 0.0001 0.0001 0.0001
this is implausible. my guess is the timing code is broken. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf