Gabe

As mentioned on Twitter, I think the following behavior should be fixed as part of the upcoming changes:

R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"        "Rejection"
length(table(runif(1e6)))
## [1] 999863

I don't expect any collisions when using Mersenne-Twister to generate a million floating point values. I'm not sure what causes this behavior, but it's documented in ?Random:

"Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)"

The "Wichman-Hill" bit is interesting:

RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 1000000
length(table(runif(1e6)))
## [1] 1000000

Mersenne-Twister has a much much larger periodicity than Wichmann-Hill, it would be great to see the above behavior also for Mersenne-Twister. Thanks for considering.


Best regards

Kirill


On 20.02.19 08:01, Gabriel Becker wrote:
Luke,

I'm happy to help with this. Its great to see this get tackled (I've cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke <luke-tier...@uiowa.edu>
wrote:

Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

      set.seed(123)
      m <- (2/5) * 2^32
      x <- sample(m, 1000000, replace = TRUE)
      table(x %% 2, x > m / 2)
      ##
      ##    FALSE   TRUE
      ## 0 300620 198792
      ## 1 200196 300392

      table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2)
      ##
      ##      0      1
      ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
     Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to