Mark,
CUDA comes with a full BLAS and FFT library (for 1D,2D and 3D transforms).
You can have relevant speed up  even for 2D transforms or for a batch of 1Ds.

You can offload only compute intendive parts of your code to the GPU
from C and C++ ( writing a wrapper from Fortran should be trivial).

The current generation of the hardware supports only single precision,
but there will be a double precision version towards the end of the
year.

Massimiliano
PS: I work on CUDA at Nvidia, so I may be a little biased...


On 4/1/07, Mark Hahn <[EMAIL PROTECTED]> wrote:
as far as I know, there are not any well-developed libraries which simply
harness whatever GPU you provide, but don't require your whole program to
be GPU-ized.  the cost of sharing data with a GPU is significant, but
blas-3 might have a high enough work-to-size ratio to make it feasible.
3d fft's might also be expressible in GPU-friendly terms (the trick would
be to utilize not fight the GPU's inherent memory-access preferences.)
perhaps some MCMC stuff might be SIMD-able?  I doubt that sequence analysis
would make much sense, since GPUs are not well-tuned to access host memory,
and sequence programs are not actually that compute-intensive.  I'd guess
that anything involving sparse matrices would be difficult to do on a GPU.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to