Hello everybody,
I'm quite new to cuda and pycuda.
I need a kernel that creates a matrix (of dimension nxd) out of an array (1xd),
by simply "repeating" the same array n times:
for example, suppose we have n = 4 and d = 3, then if the array is [1 2 3]
the result of my kernel should be:
[1 2 3
1 2 3
1 2 3
1 2 3] (a matrix 4x3)
Basically, it's the same as doing numpy.tile(array, (n, 1))
I've written the code below:
kernel_code_template = """
__global__ void TileKernel(float *in, float *out)
{
// Each thread computes one element of out
int y = blockIdx.y * blockDim.y + threadIdx.y;
int x = blockIdx.x * blockDim.x + threadIdx.x;
if (y > %(n)s || x > %(d)s) return;
out[y * %(d)s + x] = in[x];
}
"""
d = 64
n = 512
blockSizex = 16
blockSizey = 16
gridSizex = (d + blockSizex - 1) / blockSizex
gridSizey = (n + blockSizey - 1) / blockSizey
# get the kernel code from the template
kernel_code = kernel_code_template % {
'd': d,
'n': n
}
mod = SourceModule(kernel_code)
TileKernel = mod.get_function("TileKernel")
vec_cpu = np.arange(d).astype(np.float32) # just as an example
vec_gpu = gpuarray.to_gpu(vec_cpu)
out_gpu = gpuarray.empty((n, d), np.float32)
TileKernel.prepare("PP")
TileKernel.prepared_call((gridSizex, gridSizey), (blockSizex, blockSizey, 1),
vec_gpu.gpudata, out_gpu.gpudata)
out_cpu = out_gpu.get()
Now, if I run this code with d equals a power of 2 >= 16 I get the right result
(just like numpy.tile(vec_cpu, (n, 1)) );
but if I set d equals to anything else (let's say for example 88) I get that
every element of the output matrix has the
correct value, except the first column: some entries are right but others have
another value (equals to d),
and the entries of the first column that have the wrong value are different
every run.
I really can't figure out where's the problem, but maybe it's just something
simple that I'm missing...
Any help will be appreciated, thanks in advance!
Best regards,
Manuele
_______________________________________________
PyCUDA mailing list
[email protected]
https://lists.tiker.net/listinfo/pycuda