Hi,
I'm new to CUDA and PyCUDA, and have having a problem indexing multiple grids.
I'm using an older CUDA enabled card (Quadro FX 1700) before I begin writing
for
a larger GPU. I've been trying to understand the relationship between threads,
blocks, and grids in the context of my individual card. To do so, I've set up
a
simple script.
The following code will work just fine, printing out an array of values 0-99
----------------------------------------------------------------------------------------------
import pycuda.gpuarray as gpuarray
import pycuda.driver as drv
import pycuda.autoinit
def testgpu2():
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void kernel1(float *z1)
{
const int i = (blockIdx.x * blockDim.x) + threadIdx.x;
const int j = (blockIdx.y * blockDim.y) + threadIdx.y;
z1[i*j]=i*j;
}
""")
kernel1 = mod.get_function("kernel1")
z1 = numpy.zeros((100)).astype(numpy.float32)
kernel1(drv.Out(z1),block=(10,10,1),grid=(1,1))
print z1
return z1
----------------------------------------------------------------------------------------------
However, what if I have an array that's 1024 in length? If I understand the
documentation correctly, block=(16,16,1) is the max value (256 threads) allowed
for my hardware, which means I have to increase the number of grids. If I
change the parameters of my script to:
z1 = numpy.zeros((1024)).astype(numpy.float32)
kernel1(drv.Out(z1),block=(16,16,1),grid=(2,2))
How do I correctly index the array locations in my kernel function given
multiple grids (z1[???]=???) ? There is a gridDim property, but not gridIdx
property, like with threads and blocks.
Thanks!
Mike
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda