Yes, that was indeed the problem. Works really nice now, getting speed ups of up to ~5x.
As regards to parameter input checking, would it be possible to have a switch for type-checking as an argument to ElementWise? Thanks, Thomas On Wed, Jun 29, 2011 at 3:13 PM, Andreas Kloeckner <[email protected]> wrote: > On Wed, 29 Jun 2011 12:56:09 -0400, Thomas Wiecki <[email protected]> > wrote: >> This is with the version from the trunk >> (7804dc6d1b40b506b02a5f7a0b7bde8771f1446c). >> >> import pycuda.driver as cuda >> import pycuda.compiler >> import pycuda.autoinit >> import pycuda.gpuarray as gpuarray >> from pycuda.elementwise import ElementwiseKernel >> >> zero_kernel = ElementwiseKernel( >> "float *out", >> "out[i] = pdf()", >> "test", >> preamble= >> """ >> __device__ float pdf() >> { >> return 0; >> } >> """) >> >> size = 100 >> out_gpu = gpuarray.empty(size, float) >> zero_kernel(out_gpu) >> >> print all(out_gpu.get() == 0) >> print all(out_gpu.get()[:size/2] == 0) >> >> Produces output (for varying size): >> False >> True >> >> The second half is the same as before the elementwise kernel call. > > What's slightly treacherous here (but this is numpy's fault) is that > "float" in the gpuarray.empty arg refers to Python's "float" type, which > numpy will read as "float64", i.e. double precision. In the interest of > speed, ElementwiseKernel does not do argument type checking. Maybe it > should. > > HTH, > Andreas > > _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
