Quoted from Chapter 5 of CUDA 4.0 programming guide, which may be relevant.
"Reading non-naturally aligned 8-byte or 16-byte words produces incorrect results" But I still don't know how to fix the problem On Wed, Jan 18, 2012 at 3:01 AM, Andreas Kloeckner <[email protected]>wrote: > On Tue, 17 Jan 2012 16:55:22 -0500, Yifei Li <[email protected]> wrote: > > Hi all, > > > > I modified the example > > http://documen.tician.de/pycuda/tutorial.html#advanced-topics by > removing > > the '__padding' from the structure definition and got incorrect result. > > The kernel is launched with 2 blocks and one thread in each block. > > > > Each thread prints the 'len' field in structure, which should be 3 for > > block 0 and 2 for block 1. However, the result I got is: > > > > block 1: 2097664 > > block 0: 3 > > > > No such problem if I write the following program using C. Any help is > > appreciated. > > It seems CUDA doesn't automatically align the pointer, without being > told to? > > https://en.wikipedia.org/wiki/Data_structure_alignment > > Andreas > >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
