Hi Tomasz,

On Fri, 12 Aug 2011 21:56:40 +0200, Tomasz Rybak <[email protected]> wrote:
> Dnia 2011-07-29, piÄ… o godzinie 02:15 -0400, Andreas Kloeckner pisze:
> > Hi Tomasz,
> > 
> > On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" 
> > <[email protected]> wrote:
> > > I attach patch updating pycuda.tools.DeviceData and 
> > > pycuda.tools.OccupancyRecord
> > > to take new devices into consideration. I have tried to maintain "style" 
> > > of 
> > > those classes
> > > and introduced changes only when necessary. I have done changes using my 
> > > old 
> > > notes
> > > and NVIDIA Occupancy Calculator. Unfortunately I currently do not have 
> > > access to Fermi
> > > to test those fully.
> > 
> > -        self.smem_granularity = 16
> > +        if dev.compute_capability() >= (2,0):
> > +            self.smem_granularity = 128
> > +   else:
> > +            self.smem_granularity = 512
> > 
> > Way back in March, you submitted this patch, where smem_granularity is
> > documented as the number of threads taking part in a simultaneous smem
> > access. The new values just seem wrong. What am I missing, or rather,
> > what did you have in mind?
> 
> I have taken those values from CUDA_Occupancy_Calculator.xls,
> from sheet "GPU Data", cells C11-H12.
> 
> Sorry for mess. It looks like I have misunderstood smem_granularity
> meaning. I assumed (after xls file) that it was minimum size of shared
> memory that can be allocated. It looks like that from analysis of
> source code in OccupancyRecord (tools.py:294):
>         alloc_smem = _int_ceiling(shared_mem, devdata.smem_granularity)
> If I understand it correctly, it computes amount of allocated shared
> memory, rounding it to the nearest multiplication of smem_granularity.
> 
> With such assumptions, my patch makes sense - one can allocate shared
> memory in block of 512 for 1.x devices, and blocks of 128 for 2.x
> devices.
> 
> So I do not understand why there is difference between documentation
> "  .. attribute:: smem_granularity
>     
>     The number of threads that participate in banked, simultaneous
> access
>     to shared memory."
> and code, which does not take threads into consideration when
> dealing with smem_granularity.

It seems this was my mess in the first place, for having misused
smem_granularity in the occupancy calculation. Sorry about that. Fixed:

http://git.tiker.net/pycuda.git/commitdiff/280d3d9679cd81708f6bf3da420d4e178ba9de05

Andreas

Attachment: pgpB79aKScbNA.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to