Dear R gurus,

I am interested in permutations-based cpu-intensive methods so I had to pay
a little attention to Random Number Generators (RNG).
For my needs, RNGs have to:
   1) be fast. I profiled my algorithms, and for some the bottleneck was
the RNG.
   2) be scalable. Meaning that I want the RNG to remain fast as I add
threads.
   3) offer a long cycle length. Some basic generators have a cycle length
so low that in a few seconds you can finish it, making further computations
useless and redundant
   4) be able to give reproducible results independent of the number of
threads used, i.e. I want my program to give the very same exact results
using one or 10 threads
   ( 4) "be good" of course )

I found an implementation that seems to meet my criterion and made a
preliminary package to test it.
In the meantime Petr Savicky contacted saying he was about to release a
similar package called rngOpenMP.

So I decided to perform some quick benchmarks. The benchmark code is
available as a R package "rngBenchmarks" here:
https://r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests
but it depends on some unpublished package, like rngOpenMP, and my
preliminary package, yet available from the same URL.

As a benchmark I implemented a Monte-Carlo computation of PI.
I tried to use the exact same computation method, using a template argument
for the RNG, and providing wrappers for the
different available RNGs, except for the rngOpenMP that is not
instantiable, so I adapted specifically the code.

I included in the benchmark:
 -  the c implementation used by the R package Rlecuyer
 - the (GNU) random_r RNG available on GNU/linux systems and that is
reentrant
 - my RcppRandomSFMT,wrapping a modified version of the SIMD-oriented Fast
Mersenne Twister (SFMT) Random Number Generator
    provided by http://www.agner.org/random Randomc
 - rngOpenMP
I tried to include the rsprng RNG, but could not manage to use it in my
code.


My conclusions:
  - all the implementations work, meaning that the computed values converge
towards PI with the number of iterations
  - all the implementations are scalable.
  - RcppRandomSFMT and random_r are an order of magnitude faster than
rlecuyer and rngOpenMP
  - actually RcppRandomSFMT and random_r have very similar performance.

The problem with random_r is that its cycle length according to my manpage
is ~ 3E10, enabling for instance only 3 millions permutations of a vector
of 10,000 elements,
to be compared with

Leaving the RcppRandomSFMT as best candidate. This implementation also
allows multiple seeds, solving my requisite number 4, reproducible results
independent of the number of threads, if I use as second seed the task
identifier.

Of course I am probably biased, so please tell me if you have some better
ideas of benchmarks, tests of correctness, if you'd like some other
implementations to be included.

People interested in this topic couldcontact me in order that we
collaboratively propose an implementation suiting all needs.

Thanks,

Karl Forner

Annex:

I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores
each ( CPU  E5520  @ 2.27GHz).

        type threads     n        error    time time_per_chunk
1     lecuyer       1 1e+07 2.105472e-04   1.538     0.00153800
2     lecuyer       1 1e+08 4.441492e-05  15.265     0.00152650
3     lecuyer       1 1e+09 2.026819e-05 153.209     0.00153209
4     lecuyer       2 1e+07 3.182633e-04   0.821     0.00082100
5     lecuyer       2 1e+08 7.375036e-05   7.751     0.00077510
6     lecuyer       2 1e+09 9.290323e-06  76.476     0.00076476
7     lecuyer       4 1e+07 9.630351e-05   0.401     0.00040100
8     lecuyer       4 1e+08 1.263486e-05   3.887     0.00038870
9     lecuyer       4 1e+09 1.151515e-06  38.618     0.00038618
10    lecuyer       8 1e+07 1.239703e-05   0.241     0.00024100
11    lecuyer       8 1e+08 7.894518e-05   2.133     0.00021330
12    lecuyer       8 1e+09 6.782041e-06  20.420     0.00020420
13   random_r       1 1e+07 7.898746e-05   0.137     0.00013700
14   random_r       1 1e+08 4.748343e-05   1.290     0.00012900
15   random_r       1 1e+09 1.685692e-05  12.844     0.00012844
16   random_r       2 1e+07 4.757590e-06   0.095     0.00009500
17   random_r       2 1e+08 7.389450e-05   0.663     0.00006630
18   random_r       2 1e+09 2.913732e-05   6.469     0.00006469
19   random_r       4 1e+07 1.664590e-04   0.037     0.00003700
20   random_r       4 1e+08 1.138106e-04   0.330     0.00003300
21   random_r       4 1e+09 3.734717e-05   3.209     0.00003209
22   random_r       8 1e+07 1.034678e-04   0.051     0.00005100
23   random_r       8 1e+08 4.733472e-05   0.167     0.00001670
24   random_r       8 1e+09 1.985413e-05   1.694     0.00001694
25 rng_openmp       1 1e+07 2.097492e-04   1.231     0.00123100
26 rng_openmp       1 1e+08 7.580436e-05  12.155     0.00121550
27 rng_openmp       1 1e+09 2.772810e-05 120.712     0.00120712
28 rng_openmp       2 1e+07 6.870833e-05   0.609     0.00060900
29 rng_openmp       2 1e+08 1.071006e-04   6.095     0.00060950
30 rng_openmp       2 1e+09 4.296624e-05  61.426     0.00061426
31 rng_openmp       4 1e+07 3.000218e-04   0.312     0.00031200
32 rng_openmp       4 1e+08 6.313562e-05   3.066     0.00030660
33 rng_openmp       4 1e+09 1.436418e-05  30.441     0.00030441
34 rng_openmp       8 1e+07 2.767216e-04   0.191     0.00019100
35 rng_openmp       8 1e+08 9.341252e-06   1.614     0.00016140
36 rng_openmp       8 1e+09 6.874726e-06  16.121     0.00016121
37       sfmt       1 1e+07 2.313942e-04   0.114     0.00011400
38       sfmt       1 1e+08 1.485365e-06   1.186     0.00011860
39       sfmt       1 1e+09 1.083604e-05  11.417     0.00011417
40       sfmt       2 1e+07 1.898866e-04   0.062     0.00006200
41       sfmt       2 1e+08 2.627199e-06   0.573     0.00005730
42       sfmt       2 1e+09 1.295063e-05   5.779     0.00005779
43       sfmt       4 1e+07 1.691328e-04   0.033     0.00003300
44       sfmt       4 1e+08 4.942283e-05   0.315     0.00003150
45       sfmt       4 1e+09 1.232928e-05   2.871     0.00002871
46       sfmt       8 1e+07 1.001232e-04   0.022     0.00002200
47       sfmt       8 1e+08 3.991173e-05   0.149     0.00001490
48       sfmt       8 1e+09 5.775921e-06   1.550     0.00001550

         type threads     n        error    time time_div_threads
1     lecuyer       8 1e+07 6.628918e-05   0.269         0.033625
2     lecuyer       8 1e+08 8.265030e-05   2.033         0.254125
3     lecuyer       8 1e+09 6.422987e-06  20.684         2.585500
4     lecuyer       8 1e+10 1.785588e-06 203.427        25.428375
5    random_r       8 1e+07 2.349934e-04   0.017         0.002125
6    random_r       8 1e+08 6.640785e-05   0.172         0.021500
7    random_r       8 1e+09 2.074285e-05   1.701         0.212625
8    random_r       8 1e+10 2.084929e-05  17.145         2.143125
9  rng_openmp       8 1e+07 2.737931e-04   0.159         0.019875
10 rng_openmp       8 1e+08 7.173399e-07   1.624         0.203000
11 rng_openmp       8 1e+09 7.219774e-06  16.097         2.012125
12 rng_openmp       8 1e+10 4.310556e-06 161.093        20.136625
13       sfmt       8 1e+07 1.332275e-04   0.015         0.001875
14       sfmt       8 1e+08 4.277652e-05   0.149         0.018625
15       sfmt       8 1e+09 3.728551e-06   1.501         0.187625
16       sfmt       8 1e+10 6.365431e-06  15.041         1.880125

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to