On 19/09/2018 12:09 PM, Philip B. Stark wrote:
The 53 bits only encode at most 2^{32} possible values, because the
source of the float is the output of a 32-bit PRNG (the obsolete version
of MT). 53 bits isn't the relevant number here.
No, two calls to unif_rand() are used. There are two 32 bit values, but
some of the bits are thrown away.
Duncan Murdoch
The selection ratios can get close to 2. Computer scientists don't do it
the way R does, for a reason.
Regards,
Philip
On Wed, Sep 19, 2018 at 9:05 AM Duncan Murdoch <murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>> wrote:
On 19/09/2018 9:09 AM, Iñaki Ucar wrote:
> El mié., 19 sept. 2018 a las 14:43, Duncan Murdoch
> (<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>)
escribió:
>>
>> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
>>> Dear list,
>>>
>>> It looks to me that R samples random integers using an
intuitive but biased
>>> algorithm by going from a random number on [0,1) from the PRNG
to a random
>>> integer, e.g.
>>>
https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
>>>
>>> Many other languages use various rejection sampling approaches
which
>>> provide an unbiased method for sampling, such as in Go, python,
and others
>>> described here: https://arxiv.org/abs/1805.10941 (I believe the
biased
>>> algorithm currently used in R is also described there). I'm
not an expert
>>> in this area, but does it make sense for the R to adopt one of
the unbiased
>>> random sample algorithms outlined there and used in other
languages? Would
>>> a patch providing such an algorithm be welcome? What concerns
would need to
>>> be addressed first?
>>>
>>> I believe this issue was also raised by Killie & Philip in
>>> http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and
more
>>> recently in
>>>
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf
<https://www.stat.berkeley.edu/%7Estark/Preprints/r-random-issues.pdf>,
>>> pointing to the python implementation for comparison:
>>>
https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265
>>
>> I think the analyses are correct, but I doubt if a change to the
default
>> is likely to be accepted as it would make it more difficult to
reproduce
>> older results.
>>
>> On the other hand, a contribution of a new function like
sample() but
>> not suffering from the bias would be good. The normal way to
make such
>> a contribution is in a user contributed package.
>>
>> By the way, R code illustrating the bias is probably not very
hard to
>> put together. I believe the bias manifests itself in sample()
producing
>> values with two different probabilities (instead of all equal
>> probabilities). Those may differ by as much as one part in
2^32. It's
>
> According to Kellie and Philip, in the attachment of the thread
> referenced by Carl, "The maximum ratio of selection probabilities can
> get as large as 1.5 if n is just below 2^31".
Sorry, I didn't write very well. I meant to say that the difference in
probabilities would be 2^-32, not that the ratio of probabilities would
be 1 + 2^-32.
By the way, I don't see the statement giving the ratio as 1.5, but
maybe
I was looking in the wrong place. In Theorem 1 of the paper I was
looking in the ratio was "1 + m 2^{-w + 1}". In that formula m is your
n. If it is near 2^31, R uses w = 57 random bits, so the ratio
would be
very, very small (one part in 2^25).
The worst case for R would happen when m is just below 2^25, where w
is at least 31 for the default generators. In that case the ratio
could
be about 1.03.
Duncan Murdoch
--
Philip B. Stark | Associate Dean, Mathematical and Physical Sciences |
Professor, Department of Statistics |
University of California
Berkeley, CA 94720-3860 | 510-394-5077 | statistics.berkeley.edu/~stark
<http://statistics.berkeley.edu/%7Estark> |
@philipbstark
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel