Mark, CUDA comes with a full BLAS and FFT library (for 1D,2D and 3D transforms). You can have relevant speed up even for 2D transforms or for a batch of 1Ds.
You can offload only compute intendive parts of your code to the GPU from C and C++ ( writing a wrapper from Fortran should be trivial). The current generation of the hardware supports only single precision, but there will be a double precision version towards the end of the year. Massimiliano PS: I work on CUDA at Nvidia, so I may be a little biased... On 4/1/07, Mark Hahn <[EMAIL PROTECTED]> wrote:
as far as I know, there are not any well-developed libraries which simply harness whatever GPU you provide, but don't require your whole program to be GPU-ized. the cost of sharing data with a GPU is significant, but blas-3 might have a high enough work-to-size ratio to make it feasible. 3d fft's might also be expressible in GPU-friendly terms (the trick would be to utilize not fight the GPU's inherent memory-access preferences.) perhaps some MCMC stuff might be SIMD-able? I doubt that sequence analysis would make much sense, since GPUs are not well-tuned to access host memory, and sequence programs are not actually that compute-intensive. I'd guess that anything involving sparse matrices would be difficult to do on a GPU.
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf