Hi Tomasz, On Fri, 12 Aug 2011 21:56:40 +0200, Tomasz Rybak <[email protected]> wrote: > Dnia 2011-07-29, piÄ… o godzinie 02:15 -0400, Andreas Kloeckner pisze: > > Hi Tomasz, > > > > On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" > > <[email protected]> wrote: > > > I attach patch updating pycuda.tools.DeviceData and > > > pycuda.tools.OccupancyRecord > > > to take new devices into consideration. I have tried to maintain "style" > > > of > > > those classes > > > and introduced changes only when necessary. I have done changes using my > > > old > > > notes > > > and NVIDIA Occupancy Calculator. Unfortunately I currently do not have > > > access to Fermi > > > to test those fully. > > > > - self.smem_granularity = 16 > > + if dev.compute_capability() >= (2,0): > > + self.smem_granularity = 128 > > + else: > > + self.smem_granularity = 512 > > > > Way back in March, you submitted this patch, where smem_granularity is > > documented as the number of threads taking part in a simultaneous smem > > access. The new values just seem wrong. What am I missing, or rather, > > what did you have in mind? > > I have taken those values from CUDA_Occupancy_Calculator.xls, > from sheet "GPU Data", cells C11-H12. > > Sorry for mess. It looks like I have misunderstood smem_granularity > meaning. I assumed (after xls file) that it was minimum size of shared > memory that can be allocated. It looks like that from analysis of > source code in OccupancyRecord (tools.py:294): > alloc_smem = _int_ceiling(shared_mem, devdata.smem_granularity) > If I understand it correctly, it computes amount of allocated shared > memory, rounding it to the nearest multiplication of smem_granularity. > > With such assumptions, my patch makes sense - one can allocate shared > memory in block of 512 for 1.x devices, and blocks of 128 for 2.x > devices. > > So I do not understand why there is difference between documentation > " .. attribute:: smem_granularity > > The number of threads that participate in banked, simultaneous > access > to shared memory." > and code, which does not take threads into consideration when > dealing with smem_granularity.
It seems this was my mess in the first place, for having misused smem_granularity in the occupancy calculation. Sorry about that. Fixed: http://git.tiker.net/pycuda.git/commitdiff/280d3d9679cd81708f6bf3da420d4e178ba9de05 Andreas
pgpB79aKScbNA.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
