There are fixed startup costs that do not amortize well over only 400 elements.

What happens when you vary the size of the array over several orders
of magnitude?

Eli

On Mon, Apr 9, 2012 at 2:05 PM, Serra, Mr. Efren, Contractor, Code
7542 <[email protected]> wrote:
> import numpy
> """
> """
> import pycuda.driver as cuda
> import pycuda.tools
> import pycuda.gpuarray as gpuarray
> import pycuda.autoinit, pycuda.compiler
>
> a=numpy.arange(400)
> a_gpu=gpuarray.arange(400,dtype=numpy.float32)
>
> start=cuda.Event()
> end=cuda.Event()
> start.record()
> gpuarray.sum(a_gpu).get()/a.size
> end.record()
> end.synchronize()
> print "GPU array time: %fs" %(start.time_till(end)*1e-3)
>
> start.record()
> numpy.sum(a)/a.size
> end.record()
> end.synchronize()
> print "numpy array time: %fs" %(start.time_till(end)*1e-3)
>
> GPU array time: 0.000377s
> numpy array time: 0.000001s
>
> Efren A. Serra (Contractor)
> DeVine Consulting, Inc.
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave., STOP 2
> Monterey, CA 93943
> Code 7542
> Office: 831-656-4650
>
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to