If you are old enough to remember the time when the first distribute computers appeared on the scene, this is a deja-vu. Developers used to program on shared memory ( mostly with directives) were complaining about the new programming models ( PVM, MPL, MPI). Even today, if you have a serial code there is no tool that will make your code runs on a cluster. Even on a single system, if you try an auto-parallel/auto-vectorizing compiler on a real code, your results will probably be disappointing.
When you can get a 10x boost on a production code rewriting some portions of your code to use the GPU, if time to solution is important or you could perform simulations that were impossible before ( for example using algorithms that were just too slow on CPUs, Discontinuous Galerkin method is a perfect example), there are a lot of developers that will write the code. The effort it is clearly dependent of the code, the programmer and the tool used ( you can go from fully custom GPU code with CUDA or OpenCL, to automatically generated CUF kernels from PGI, to directives using HMPP or PGI Accelerator). In situation where time to solution relates to money, for example oil and gas, GPUs are the answer today ( you will be surprised by the number of GPUs in Houston). Look at the performance and scaling of AMBER ( MPI+ CUDA), http://ambermd.org/gpus/benchmarks.htm, and tell me that the results were not worth the effort. Is GPU programming for everyone: probably not, in the same measure that parallel programming in not for everyone. Better tools will lower the threshold, but a threshold will be always present. Massimiliano PS: Full disclosure, I work at Nvidia on CUDA ( CUDA Fortran, applications porting with CUDA, MPI+CUDA). 2011/4/4 "C. Bergström" <cbergst...@pathscale.com>: > Herbert Fruchtl wrote: >> They hear great success stories (which in reality are often prototype >> implementations that do one carefully chosen benchmark well), then look at >> the >> API, look at their existing code, and postpone the start of their project >> until >> they have six months spare time for it. And we know when that is. >> >> The current approach with more or less vendor specific libraries (be they >> "open" >> or not) limits the uptake of GPU computing to a few hardcore developers of >> experimental codes who don't mind rewriting their code every two years. It >> won't >> become mainstream until we have a compiler that turns standard Fortran (or >> C++, >> if it has to be) into GPU code. Anything that requires more change than let's >> say OpenMP directives is doomed, and rightly so. >> > Hi Herbert, > > I think your perspective pretty much nails it > > (shameless self promotion) > http://www.pathscale.com/ENZO (PathScale HMPP - native codegen) > http://www.pathscale.com/pdf/PathScale-ENZO-1.0-UserGuide.pdf > http://www.caps-entreprise.com/hmpp.html (CAPS HMPP - source to source) > > This is really only the tip of the problem and there must also be > solutions for scaling *efficiently* across the cluster. (No MPI + CUDA > or even HMPP is *not* the answer imho.) > > ./C > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf