Hi, I just made some experiments with the CURAND wrappers. It seem to work
very nicely except for a little detail that I can't figure out. The
initialization of the generator and the actual random number generation seem
very fast. But for what ever reason, PyCUDA take a long time to "recover"
after the number generation. This pause is significantly longer than the
actual computation and the delay increase with N. Here is an example:


import numpy as np
import pycuda.autoinit
import pycuda.gpuarray
from pycuda.curandom import PseudoRandomNumberGenerator,
QuasiRandomNumberGenerator
import cProfile
import time as clock


def curand_prof():

    N = 100000000

    t1 = clock.time()
    # GPU
    rr = PseudoRandomNumberGenerator(0,
np.random.random(128).astype(np.int32))
    data = pycuda.gpuarray.zeros([N], np.float32)
    rr.fill_normal_float(data.gpudata, N)
    t2 = clock.time()
    print "Bench 1: " + str(t2-t1) + " sec"


if __name__ == "__main__":
    t1 = clock.time()
    curand_prof()
    t2 = clock.time()
    print "Bench 2: " + str(t2-t1) + " sec"


Here is the actual output with a GTX 260 gpu:
Bench 1: 0.0117599964142 sec
Bench 2: 4.40562295914 sec

In the example, the pause have no consequence, but if I want to use the
random matrix in an other kernel ... it's quite a delay. I've made some
research and my guess is that the problem is linked to this already reported
problem here:

http://forums.nvidia.com/index.php?showtopic=185740

Anyone knows how we can implement the solution to the wrapper ?

Martin
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to