Hi,
I'm new to CUDA and PyCUDA, and have having a problem indexing multiple grids.  
I'm using an older CUDA enabled card (Quadro FX 1700) before I begin writing 
for 
a larger GPU.  I've been trying to understand the relationship between threads, 
blocks, and grids in the context of my individual card.  To do so, I've set up 
a 
simple script.  


The following code will work just fine, printing out an array of values 0-99
----------------------------------------------------------------------------------------------

import pycuda.gpuarray as gpuarray
import pycuda.driver as drv
import pycuda.autoinit

def testgpu2():
     from pycuda.compiler import SourceModule
     
     mod = SourceModule("""
      __global__ void kernel1(float *z1)
     {
      const int i =  (blockIdx.x * blockDim.x) + threadIdx.x;
      const int j =  (blockIdx.y * blockDim.y) + threadIdx.y;
      
       z1[i*j]=i*j;
      
     }
     """)
     
     kernel1 = mod.get_function("kernel1")
    
     z1 = numpy.zeros((100)).astype(numpy.float32)
         
     kernel1(drv.Out(z1),block=(10,10,1),grid=(1,1))
     
     print z1
    
     return z1

----------------------------------------------------------------------------------------------

However, what if I have an array that's 1024 in length?  If I understand the 
documentation correctly, block=(16,16,1) is the max value (256 threads) allowed 
for my hardware, which means I have to increase the number of grids.  If I 
change the parameters of my script to:

     z1 = numpy.zeros((1024)).astype(numpy.float32)
     kernel1(drv.Out(z1),block=(16,16,1),grid=(2,2))

How do I correctly index the array locations in my kernel function given 
multiple grids (z1[???]=???) ?  There is a gridDim property, but not gridIdx 
property, like with threads and blocks. 


Thanks!
Mike



      
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to