Dear Efren,
Your numpy timings are incredible.
Array size: 4,000,000
GPU array time: 0.001961s
numpy array time: 0.000001s
This 1 microsecond seems to be rather constant.
start.record()
numpy.sum(a)/a.size
end.record()
end.synchronize()
Could it be that this timing code is for asynchronous GPU calls? Try this:
import timeit
t=timeit.Timer( setup="from __main__ import numpy,a" ,
stmt="numpy.sum(a)/a.size")
print "Numpy timing", t.timeit(1000)/1000,"s"
Same approach could be interesting for your GPU calls if you want to get
the python walltimes.
Best,
Jon
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda