Hi, I just made some experiments with the CURAND wrappers. It seem to work
very nicely except for a little detail that I can't figure out. The
initialization of the generator and the actual random number generation seem
very fast. But for what ever reason, PyCUDA take a long time to "recover"
after the number generation. This pause is significantly longer than the
actual computation and the delay increase with N. Here is an example:
import numpy as np
import pycuda.autoinit
import pycuda.gpuarray
from pycuda.curandom import PseudoRandomNumberGenerator,
QuasiRandomNumberGenerator
import cProfile
import time as clock
def curand_prof():
N = 100000000
t1 = clock.time()
# GPU
rr = PseudoRandomNumberGenerator(0,
np.random.random(128).astype(np.int32))
data = pycuda.gpuarray.zeros([N], np.float32)
rr.fill_normal_float(data.gpudata, N)
t2 = clock.time()
print "Bench 1: " + str(t2-t1) + " sec"
if __name__ == "__main__":
t1 = clock.time()
curand_prof()
t2 = clock.time()
print "Bench 2: " + str(t2-t1) + " sec"
Here is the actual output with a GTX 260 gpu:
Bench 1: 0.0117599964142 sec
Bench 2: 4.40562295914 sec
In the example, the pause have no consequence, but if I want to use the
random matrix in an other kernel ... it's quite a delay. I've made some
research and my guess is that the problem is linked to this already reported
problem here:
http://forums.nvidia.com/index.php?showtopic=185740
Anyone knows how we can implement the solution to the wrapper ?
Martin
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda