Re: [PyCUDA] New occupancy values - patch

Tomasz Rybak Fri, 12 Aug 2011 12:58:00 -0700

Sorry for delay in response.

Dnia 2011-07-29, pią o godzinie 02:15 -0400, Andreas Kloeckner pisze:
> Hi Tomasz,
> 
> On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" 
> <[email protected]> wrote:
> > I attach patch updating pycuda.tools.DeviceData and 
> > pycuda.tools.OccupancyRecord
> > to take new devices into consideration. I have tried to maintain "style" of 
> > those classes
> > and introduced changes only when necessary. I have done changes using my 
> > old 
> > notes
> > and NVIDIA Occupancy Calculator. Unfortunately I currently do not have 
> > access to Fermi
> > to test those fully.
> 
> -        self.smem_granularity = 16
> +        if dev.compute_capability() >= (2,0):
> +            self.smem_granularity = 128
> +     else:
> +            self.smem_granularity = 512
> 
> Way back in March, you submitted this patch, where smem_granularity is
> documented as the number of threads taking part in a simultaneous smem
> access. The new values just seem wrong. What am I missing, or rather,
> what did you have in mind?


I have taken those values from CUDA_Occupancy_Calculator.xls,
from sheet "GPU Data", cells C11-H12.

Sorry for mess. It looks like I have misunderstood smem_granularity
meaning. I assumed (after xls file) that it was minimum size of shared
memory that can be allocated. It looks like that from analysis of
source code in OccupancyRecord (tools.py:294):
        alloc_smem = _int_ceiling(shared_mem, devdata.smem_granularity)
If I understand it correctly, it computes amount of allocated shared
memory, rounding it to the nearest multiplication of smem_granularity.

With such assumptions, my patch makes sense - one can allocate shared
memory in block of 512 for 1.x devices, and blocks of 128 for 2.x
devices.

So I do not understand why there is difference between documentation
"  .. attribute:: smem_granularity
    
    The number of threads that participate in banked, simultaneous
access
    to shared memory."
and code, which does not take threads into consideration when
dealing with smem_granularity.

Best regards.


> 
> In any case, I've reverted them to 16/32 in git.

Why those values (where did you get the original 16 from)?

Regards.

-- 
Tomasz Rybak <[email protected]> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] New occupancy values - patch

Reply via email to