Re: [PyCUDA] global memory alignment issue

Yifei Li Sat, 21 Jan 2012 11:13:29 -0800

>
>
>> 0) what are you trying to do?
>>
> I'm trying to see if the struct example in the tutorial still works
> without padding. And following is the kernel function I use:
>
>  __global__ void test(DoubleOperation *a) {
>         a = &a[blockIdx.x];
>         printf("block %d: %d\n", blockIdx.x, a->datalen);
>
> The kernel is launched using 2 blocks, each of which has one thread.
>
>
>
>> a) what is the problem?
>>
> I actually have two questions.
>
> 1) The example fails to work without padding, the second block prints the
> wrong 'datalen'.
> However, if I use CUDA runtime API instead of pycuda, the result is
> correct even without the padding.
>


The trick suggested by Stanley works. So let me answer this question
myself: the C compiler automatically figures out the size of the struct
should be 16 instead of 12, which matches the size of the struct on the
device.

Actually, I found that if the program is written using CUDA runtime API,
you don't need to worry about alignment at all. For example, I tried
several structs with different sizes, and the values of the fields in a
struct are always printed correctly.

Why is that? This seems to contradict with the following (quoted from
Chapter 5 of CUDA 4.0 programming guide):

*A typical case where this might be easily overlooked is when using some
custom global memory allocation scheme, whereby the allocations of multiple
arrays (with multiple calls to cudaMalloc() or cuMemAlloc()) is replaced by
the allocation of a single large block of memory partitioned into multiple
arrays, in which case the starting address of each array is offset from the
block"s starting address.  *


> 2) Since the size of the struct without padding is 12 bytes, I tried a
> different struct but of the same size:
> struct DoubleOperation{
>      float x;
>      float y;
>      float z;
> }
> And the kernel function is changed to
>  __global__ void test(DoubleOperation *a) {
>         a = &a[blockIdx.x];
>         printf("block %d: %f %f %fn", blockIdx.x, a->x, a->y, a->z);
> But this time the values of x, y and z are printed correctly by both
> blocks. So why does it work even though the struct's size is the same as
> before?
>

However, I still don't have answer for this.

>
>
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] global memory alignment issue

Reply via email to