Yes, that was indeed the problem. Works really nice now, getting speed
ups of up to ~5x.

As regards to parameter input checking, would it be possible to have a
switch for type-checking as an argument to ElementWise?

Thanks,
Thomas

On Wed, Jun 29, 2011 at 3:13 PM, Andreas Kloeckner
<[email protected]> wrote:
> On Wed, 29 Jun 2011 12:56:09 -0400, Thomas Wiecki <[email protected]> 
> wrote:
>> This is with the version from the trunk
>> (7804dc6d1b40b506b02a5f7a0b7bde8771f1446c).
>>
>> import pycuda.driver as cuda
>> import pycuda.compiler
>> import pycuda.autoinit
>> import pycuda.gpuarray as gpuarray
>> from pycuda.elementwise import ElementwiseKernel
>>
>> zero_kernel = ElementwiseKernel(
>>     "float *out",
>>     "out[i] = pdf()",
>>     "test",
>>     preamble=
>>     """
>>     __device__ float pdf()
>>     {
>>         return 0;
>>     }
>>     """)
>>
>> size = 100
>> out_gpu = gpuarray.empty(size, float)
>> zero_kernel(out_gpu)
>>
>> print all(out_gpu.get() == 0)
>> print all(out_gpu.get()[:size/2] == 0)
>>
>> Produces output (for varying size):
>> False
>> True
>>
>> The second half is the same as before the elementwise kernel call.
>
> What's slightly treacherous here (but this is numpy's fault) is that
> "float" in the gpuarray.empty arg refers to Python's "float" type, which
> numpy will read as "float64", i.e. double precision. In the interest of
> speed, ElementwiseKernel does not do argument type checking. Maybe it
> should.
>
> HTH,
> Andreas
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to