Note that this is from a "user" perspective, as I have no particular plan of developing the details of this implementation, but I've thought for a long time that GPU support could be great for numpy (I would also vote for OpenCL support over cuda, although conceptually they seem quite similar)... But what exactly would the large-scale plan be? One of the advantages of GPGPUs is that they are particularly suited to rather complicated paralellizable algorithms, and the numpy-level basic operations are just the simple arithmatic operations. So while I'd love to see it working, it's unclear to me exactly how much is gained at the core numpy level, especially given that it's limited to single-precision on most GPUs.
Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for the numpy or scipy versions. By the way, I noticed no one mentioned the GPUArray class in pycuda (and it looks like there's something similar in the pyopencl) - seems like that's already done a fair amount of the work... http://documen.tician.de/pycuda/array.html#pycuda.gpuarray.GPUArray On Thu, Aug 6, 2009 at 10:41 AM, James Bergstra <bergs...@iro.umontreal.ca>wrote: > On Thu, Aug 6, 2009 at 1:19 PM, Charles R > Harris<charlesr.har...@gmail.com> wrote: > > I almost looks like you are reimplementing numpy, in c++ no less. Is > there > > any reason why you aren't working with a numpy branch and just adding > > ufuncs? > > I don't know how that would work. The Ufuncs need a datatype to work > with, and AFAIK, it would break everything if a numpy ndarray pointed > to memory on the GPU. Could you explain what you mean a little more? > > > I'm also curious if you have thoughts about how to use the GPU > > pipelines in parallel. > > Current thinking for ufunc type computations: > 1) divide up the tensors into subtensors whose dimensions have > power-of-two sizes (this permits a fast integer -> ndarray coordinate > computation using bit shifting), > 2) launch a kernel for each subtensor in it's own stream to use > parallel pipelines. > 3) sync and return. > > This is a pain to do without automatic code generation though. > Currently we're using macros, but that's not pretty. > C++ has templates, which we don't really use yet, but were planning on > using. These have some power to generate code. > The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray > was created has a more powerful code generation mechanism similar to > weave. This algorithm is used in theano-cuda-ndarray. > Scipy.weave could be very useful for generating code for specific > shapes/ndims on demand, if weave could use nvcc. > > James > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion