On Tue, 15 Feb 2011 21:36:27 +0100, Tomasz Rybak <[email protected]> wrote: > I disagree here. IMO it makes no sense to use more blocks than there > is SMs, as it introduces burden of switching blocks. In case of my code > there is no switching between blocks - SM gets block to execute, > executes kernel generating random numbers, finishes. After your change > SM gets block, executes it, gets another block, ..., finishes. > > Each thread already generates multiple random numbers in the loop. > After your change it just loops less times than in my code. > > Time for generating 100 000 000 floats on GF104: > using 3*SMs: 0.0315589904785 > using 1*SMs: 0.0291240215302 > Those times are repeatable - for 3x I get 0.031, for 1x I get 0.029. > > So please - revert to previous state (just apply attached patch).
Can't argue with that. > OK, but do not punish Fermi for lacks of Tesla; I added test and use > half threads only on Tesla. Fermi should still use maximum number > of threads. Also fair. I've applied your patch. Andreas
pgpMaGdT6V3Tv.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
