On 4/1/07, Mark Hahn <[EMAIL PROTECTED]> wrote:


I assume this is only single-precision, and I would guess that for
numerical stability, you must be limited to fairly short fft's.
what kind of peak flops do you see?  what's the overhead of shoving
data onto the GPU, and getting it back?  (or am I wrong that the GPU
cannot do an FFT in main (host) memory?

I will run some benchmark in the next days ( I usually do more than
just an FFT).
I remember some numbers for SGEMM (real SGEMM C=alphaA*B+beta*C), 120
Gflops on board, 80 Gflops measured from the host (with all the I/O
overhead) , for N=2048.



> You can offload only compute intendive parts of your code to the GPU
> from C and C++ ( writing a wrapper from Fortran should be trivial).

sure, but what's the cost (in time and CPU overhead) to moving data
around like this?

It depends on your chipset and from other details ( cold access, data
in cache, pinned memory): it goes from around 1GB/s to 3GB/s.


> The current generation of the hardware supports only single precision,
> but there will be a double precision version towards the end of the
> year.

do you mean synthetic doubles?  I'm guessing that the hardware isn't
going to gain the much wider multipliers necessary to support doubles
at the same latency as singles...


Can't comment on this one..... :-)


Massimiliano
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to