STREAM Benchmark implementation in CUDA
Array size (single precision)=8000000
using 128 threads per block, 62500 blocks
Function Rate (MB/s) Avg time Min time Max time
Copy: 16706.3212 0.0039 0.0038 0.0044
Scale: 16666.2770 0.0046 0.0038 0.0100
Add: 18408.0866 0.0053 0.0052 0.0056
Triad: 18738.6603 0.0052 0.0051 0.0055
I got
STREAM Benchmark implementation in CUDA
Array size (single precision)=8000000
using 128 threads per block, 62500 blocks
Copy: 50006.6051 0.0013 0.0013 0.0013
Scale: 50006.6051 0.0013 0.0013 0.0013
Add: 56409.8044 0.0017 0.0017 0.0017
Triad: 56409.8044 0.0017 0.0017 0.0017
on a "nVidia Corporation G80 [Quadro FX 4600] (rev a2)".
wikipedia quotes 67.2 GB/s theoretical.
it didn't matter whether the machine was in init 3 or 5, though the X
config was just an idle 1280x1024 server.
Kudos to Nvidia for having a linux friendly toolchain that I could find,
download, install, and compile a code with minimal hassle.
absolutely. AMD has really dropped the ball on this, even though it looks
like they at least announced availability of DP earlier...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf