Re: [PyCUDA] CURAND

Andreas Kloeckner Tue, 15 Feb 2011 14:39:22 -0800

On Tue, 15 Feb 2011 21:36:27 +0100, Tomasz Rybak <[email protected]> wrote:
> I disagree here. IMO it makes no sense to use more blocks than there
> is SMs, as it introduces burden of switching blocks. In case of my code
> there is no switching between blocks - SM gets block to execute,
> executes kernel generating random numbers, finishes. After your change
> SM gets block, executes it, gets another block, ..., finishes.
> 
> Each thread already generates multiple random numbers in the loop.
> After your change it just loops less times than in my code.
> 
> Time for generating 100 000 000 floats on GF104:
> using 3*SMs: 0.0315589904785
> using 1*SMs: 0.0291240215302
> Those times are repeatable - for 3x I get 0.031, for 1x I get 0.029.
> 
> So please - revert to previous state (just apply attached patch).


Can't argue with that.

> OK, but do not punish Fermi for lacks of Tesla; I added test and use
> half threads only on Tesla. Fermi should still use maximum number
> of threads. 

Also fair.

I've applied your patch.

Andreas

pgpMaGdT6V3Tv.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] CURAND

Reply via email to