Sorry for delay in response. Dnia 2011-07-29, piÄ… o godzinie 02:15 -0400, Andreas Kloeckner pisze: > Hi Tomasz, > > On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" > <[email protected]> wrote: > > I attach patch updating pycuda.tools.DeviceData and > > pycuda.tools.OccupancyRecord > > to take new devices into consideration. I have tried to maintain "style" of > > those classes > > and introduced changes only when necessary. I have done changes using my > > old > > notes > > and NVIDIA Occupancy Calculator. Unfortunately I currently do not have > > access to Fermi > > to test those fully. > > - self.smem_granularity = 16 > + if dev.compute_capability() >= (2,0): > + self.smem_granularity = 128 > + else: > + self.smem_granularity = 512 > > Way back in March, you submitted this patch, where smem_granularity is > documented as the number of threads taking part in a simultaneous smem > access. The new values just seem wrong. What am I missing, or rather, > what did you have in mind?
I have taken those values from CUDA_Occupancy_Calculator.xls,
from sheet "GPU Data", cells C11-H12.
Sorry for mess. It looks like I have misunderstood smem_granularity
meaning. I assumed (after xls file) that it was minimum size of shared
memory that can be allocated. It looks like that from analysis of
source code in OccupancyRecord (tools.py:294):
alloc_smem = _int_ceiling(shared_mem, devdata.smem_granularity)
If I understand it correctly, it computes amount of allocated shared
memory, rounding it to the nearest multiplication of smem_granularity.
With such assumptions, my patch makes sense - one can allocate shared
memory in block of 512 for 1.x devices, and blocks of 128 for 2.x
devices.
So I do not understand why there is difference between documentation
" .. attribute:: smem_granularity
The number of threads that participate in banked, simultaneous
access
to shared memory."
and code, which does not take threads into consideration when
dealing with smem_granularity.
Best regards.
>
> In any case, I've reverted them to 16/32 in git.
Why those values (where did you get the original 16 from)?
Regards.
--
Tomasz Rybak <[email protected]> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
signature.asc
Description: This is a digitally signed message part
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
