On 12-10-19 7:04 PM, Hervé Pagès wrote:
Hi,

Looks like the implementation of random number generation changed in
R-devel with respect to R-2.15.1.

With R-2.15.1:

    > set.seed(33)
    > sample(49821115, 10)
     [1] 22217252 19661919 24099911 45779422 42043111 25774933 21778053
17098516
     [9]   773073  5878451

With recent R-devel:

    > set.seed(33)
    > sample(49821115, 10)
     [1] 22217252 19661919 24099912 45779425 42043115 25774935 21778056
17098518
     [9]   773073  5878452

This is on a 64-bit Ubuntu system.

Is this change intended? I didn't see anything in the NEWS file.

A potential problem with this is that it will break unit tests
for algorithms that make use of RNG.

Another more practical problem (at least for me) is the following:
Bioconductor package maintainers are sometimes working hard on the
development version of their package to improve the performance of
some key functions. Comparing performance between BioC release
(based on R-2.15) and devel (based on R-devel) often requires big
input data that is randomly generated, because it's easiest than
working with real data. Typically a small script is written that
takes care of loading the required packages, generating the input
data, and running a simple analysis. The same script is sourced in
R-2.15 and R-devel, and performance and results are compared.

Not being able to generate exactly the same input in the script is
a problem. It can be worked around by generating the input once,
serializing it, and use load() in the script, but that makes things
more complicated and the script is not a standalone script anymore
(cannot be passed around without also passing around the big .rda
file).

Thanks,
H.


I think it was mentioned in the NEWS:

 \code{sample.int()} has some support for  \eqn{n \ge
 2^{31}}{n >= 2^31}: see its help for the limitations.

 A different algorithm is used for \code{(n, size, replace = FALSE,
 prob = NULL)} for \code{n > 1e7} and \code{size <= n/2}.  This
 is much faster and uses less memory, but does give different results.

I don't think the old algorithm is available, but perhaps it could be made available by an optional parameter.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to