On 19/09/2018 12:09 PM, Philip B. Stark wrote:
The 53 bits only encode at most 2^{32} possible values, because the source of the float is the output of a 32-bit PRNG (the obsolete version of MT). 53 bits isn't the relevant number here.

No, two calls to unif_rand() are used. There are two 32 bit values, but some of the bits are thrown away.

Duncan Murdoch


The selection ratios can get close to 2. Computer scientists don't do it the way R does, for a reason.

Regards,
Philip

On Wed, Sep 19, 2018 at 9:05 AM Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:

    On 19/09/2018 9:09 AM, Iñaki Ucar wrote:
     > El mié., 19 sept. 2018 a las 14:43, Duncan Murdoch
     > (<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>)
    escribió:
     >>
     >> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
     >>> Dear list,
     >>>
     >>> It looks to me that R samples random integers using an
    intuitive but biased
     >>> algorithm by going from a random number on [0,1) from the PRNG
    to a random
     >>> integer, e.g.
     >>>
    https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
     >>>
     >>> Many other languages use various rejection sampling approaches
    which
     >>> provide an unbiased method for sampling, such as in Go, python,
    and others
     >>> described here: https://arxiv.org/abs/1805.10941 (I believe the
    biased
     >>> algorithm currently used in R is also described there).  I'm
    not an expert
     >>> in this area, but does it make sense for the R to adopt one of
    the unbiased
     >>> random sample algorithms outlined there and used in other
    languages?  Would
     >>> a patch providing such an algorithm be welcome? What concerns
    would need to
     >>> be addressed first?
     >>>
     >>> I believe this issue was also raised by Killie & Philip in
     >>> http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and
    more
     >>> recently in
     >>>
    https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf
    <https://www.stat.berkeley.edu/%7Estark/Preprints/r-random-issues.pdf>,
     >>> pointing to the python implementation for comparison:
     >>>
    
https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265
     >>
     >> I think the analyses are correct, but I doubt if a change to the
    default
     >> is likely to be accepted as it would make it more difficult to
    reproduce
     >> older results.
     >>
     >> On the other hand, a contribution of a new function like
    sample() but
     >> not suffering from the bias would be good.  The normal way to
    make such
     >> a contribution is in a user contributed package.
     >>
     >> By the way, R code illustrating the bias is probably not very
    hard to
     >> put together.  I believe the bias manifests itself in sample()
    producing
     >> values with two different probabilities (instead of all equal
     >> probabilities).  Those may differ by as much as one part in
    2^32.  It's
     >
     > According to Kellie and Philip, in the attachment of the thread
     > referenced by Carl, "The maximum ratio of selection probabilities can
     > get as large as 1.5 if n is just below 2^31".

    Sorry, I didn't write very well.  I meant to say that the difference in
    probabilities would be 2^-32, not that the ratio of probabilities would
    be 1 + 2^-32.

    By the way, I don't see the statement giving the ratio as 1.5, but
    maybe
    I was looking in the wrong place.  In Theorem 1 of the paper I was
    looking in the ratio was "1 + m 2^{-w + 1}".  In that formula m is your
    n.  If it is near 2^31, R uses w = 57 random bits, so the ratio
    would be
    very, very small (one part in 2^25).

    The worst case for R would happen when m  is just below  2^25, where w
    is at least 31 for the default generators.  In that case the ratio
    could
    be about 1.03.

    Duncan Murdoch



--
Philip B. Stark | Associate Dean, Mathematical and Physical Sciences | Professor,  Department of Statistics |
University of California
Berkeley, CA 94720-3860 | 510-394-5077 | statistics.berkeley.edu/~stark <http://statistics.berkeley.edu/%7Estark> |
@philipbstark


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to