That efficiency is not too unusual for CUDA devices.

My GTX 580 3GB gives this result:

0.854149196146 utilization (1.0 is perfect utilization).
Achieved bandwidth: 156 GB/s
Theoretical maximum bandwidth: 183 GB/s
Fastest kernel execution time: 0.000488095998764
Optimum block shape: (144, 1, 1)

You should check whether ECC is turned on for your C2070 using the nvidia-smi 
tool.  ECC slows down the memory throughput, which might explain the difference 
between 85% utilization and 72% utilization.

On Feb 16, 2012, at 2:57 PM, Jesse Lu wrote:

> Hi everyone,
> 
> I ran a simple experiment today, which consisted of trying to maximize the 
> memory (device memory) throughput of a very simple kernel. I was slightly 
> disappointed that I was only able to achieve 72% of the theoretical maximum 
> bandwidth. My GPU is a C2070. The file is attached and is executed using:
> 
> $ python test_pycuda_speed.py 
> 0.72196600476 utilization (1.0 is perfect utilization).
> Achieved bandwidth: 98 GB/s
> Theoretical maximum bandwidth: 136 GB/s
> Fastest kernel execution time: 0.000777023971081
> Optimum block shape: (160, 1, 1)
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.814s
> 
> OK
> 
> The questions that I have are:
>       • How close can others get to the theoretical peak bandwidth?
>       • Any suggested tweaks to increase performance?
> Thanks!
> 
> Jesse
> <test_pycuda_speed.py>_______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda


_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to