STREAM Benchmark implementation in CUDA
Array size (single precision)=8000000
using 128 threads per block, 62500 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16706.3212       0.0039       0.0038       0.0044
Scale:      16666.2770       0.0046       0.0038       0.0100
Add:        18408.0866       0.0053       0.0052       0.0056
Triad:      18738.6603       0.0052       0.0051       0.0055

I got
 STREAM Benchmark implementation in CUDA
 Array size (single precision)=8000000
 using 128 threads per block, 62500 blocks
Copy:       50006.6051       0.0013       0.0013       0.0013
Scale:      50006.6051       0.0013       0.0013       0.0013
Add:        56409.8044       0.0017       0.0017       0.0017
Triad:      56409.8044       0.0017       0.0017       0.0017

on a "nVidia Corporation G80 [Quadro FX 4600] (rev a2)".
wikipedia quotes 67.2 GB/s theoretical.

it didn't matter whether the machine was in init 3 or 5, though the X config was just an idle 1280x1024 server.

Kudos to Nvidia for having a linux friendly toolchain that I could find, download, install, and compile a code with minimal hassle.

absolutely.  AMD has really dropped the ball on this, even though it looks
like they at least announced availability of DP earlier...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to