Re: [PyCUDA] global memory alignment issue

Yifei Li Sat, 21 Jan 2012 10:44:25 -0800

On Sat, Jan 21, 2012 at 12:17 PM, Andreas Kloeckner <[email protected]
> wrote:


> On Sat, 21 Jan 2012 11:24:33 -0500, Yifei Li <[email protected]> wrote:
> > On Fri, Jan 20, 2012 at 9:58 PM, Anthony LaTorre <[email protected]
> >wrote:
> >
> > > add the padding field!
> >
> > I know this will work. But without the padding field, the same program
> > written in C does not have any issue.
>
> I'm really not sure what you're trying to do here. Can you explain
> properly:
>

Sorry for the confusion.

>
> 0) what are you trying to do?
>
I'm trying to see if the struct example in the tutorial still works without
padding. And following is the kernel function I use:

 __global__ void test(DoubleOperation *a) {
        a = &a[blockIdx.x];
        printf("block %d: %d\n", blockIdx.x, a->datalen);

The kernel is launched using 2 blocks, each of which has one thread.



> a) what is the problem?
>
I actually have two questions.

1) The example fails to work without padding, the second block prints the
wrong 'datalen'.
However, if I use CUDA runtime API instead of pycuda, the result is correct
even without the padding.

2) Since the size of the struct without padding is 12 bytes, I tried a
different struct but of the same size:
struct DoubleOperation{
     float x;
     float y;
     float z;
}
And the kernel function is changed to
 __global__ void test(DoubleOperation *a) {
        a = &a[blockIdx.x];
        printf("block %d: %f %f %fn", blockIdx.x, a->x, a->y, a->z);
But this time the values of x, y and z are printed correctly by both
blocks. So why does it work even though the struct's size is the same as
before?



> b) what is this mysterious 'C program' you keep referring to? CUDA C?
>
I translate the code using pycuda into the one using CUDA runtime API


> c) what have you tried?
>
I tried changing the order of the fields in the struct, but the second
block still prints the wrong 'datalen'

struct DoubleOperation {
        float *ptr;
        int datalen;
    };


class VecStruct:
    mem_size = 4 + numpy.intp(0).nbytes
    def __init__(self, array, struct_arr_ptr):
        self.data = cuda.to_device(array)
        cuda.memcpy_htod(int(struct_arr_ptr) , numpy.intp(int(self.data)))
        cuda.memcpy_htod(int(struct_arr_ptr)+8, numpy.int32(array.size))




> d) what were the outcomes?
>
> Andreas
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] global memory alignment issue

Reply via email to