Re: [Rd] A different error in sample()

2018-09-19 Thread Wolfgang Huber
FWIW, I suspect this is related to the function R_unif_index that was introduced in src/main/RNG.c around revision 72356, or the way this function is used in do_sample in src/main/random.c. 20.9.18 08:19, Wolfgang Huber scripsit: Besides wording of the documentation re truncating vs rounding, t

Re: [Rd] A different error in sample()

2018-09-19 Thread Wolfgang Huber
Besides wording of the documentation re truncating vs rounding, there is something peculiar going on with the fractional part of n: > table(sample.int(2.5, 1e6, replace = TRUE)) 1 2 3 399051 401035 199914 > table(sample.int(3, 1e6, replace = TRUE)) 1 2 3 332956 3

Re: [Rd] A different error in sample()

2018-09-19 Thread Dario Strbenac
Good day, The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer. > round(2.5) [1] 2 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Carl Boettiger
For a well-tested C algorithm, based on my reading of Lemire, the unbiased "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C standard library in OpenBSD and macOS (as arc4random_uniform), and in the GNU standard library. Lemire also provides C++ code in the appendix of his

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Ben Bolker
A quick point of order here: arguing with Duncan in this forum is helpful to expose ideas, but probably neither side will convince the other; eventually, if you want this adopted in core R, you'll need to convince an R-core member to pursue this fix. In the meantime, a good, well-tested impl

Re: [Rd] Poor documentation for "adj" and text()

2018-09-19 Thread frederik
Thanks Martin. I wouldn't necessarily fault Ulrich for his subject line - unless you want to propose a better one... I might fault him for not following up and checking out the patch that I submitted at his prompting. I noticed that you committed my patch last Friday, with some welcome improvemen

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 5:57 PM, David Hugh-Jones wrote: It doesn't seem too hard to come up with plausible ways in which this could give bad results. Suppose I sample rows from a large dataset, maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g. odd rows are males, even rows are f

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread David Hugh-Jones
It doesn't seem too hard to come up with plausible ways in which this could give bad results. Suppose I sample rows from a large dataset, maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g. odd rows are males, even rows are females. Oops! Very non-representative sample, bootstr

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 3:52 PM, Philip B. Stark wrote: Hi Duncan-- Nice simulation! The absolute difference in probabilities is small, but the maximum relative difference grows from something negligible to almost 2 as m approaches 2**31. Because the L_1 distance between the uniform distribution on {

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Philip B. Stark
One more thing, apropos this: I'm still not convinced that there has ever been a simulation run with detectable bias compared to Monte Carlo error unless it (like this one) was designed specifically to show the problem. I often use random permutations to simulate p-values to calibrate permutatio

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Philip B. Stark
Hi Duncan-- Nice simulation! The absolute difference in probabilities is small, but the maximum relative difference grows from something negligible to almost 2 as m approaches 2**31. Because the L_1 distance between the uniform distribution on {1, ..., m} and what you actually get is large, ther

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Philip B. Stark
No, the 2nd call only happens when m > 2**31. Here's the code: (RNG.c, lines 793ff) double R_unif_index(double dn) { double cut = INT_MAX; switch(RNG_kind) { case KNUTH_TAOCP: case USER_UNIF: case KNUTH_TAOCP2: cut = 33554431.0; /* 2^25 - 1 */ break; default: break;

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Philip B. Stark
The 53 bits only encode at most 2^{32} possible values, because the source of the float is the output of a 32-bit PRNG (the obsolete version of MT). 53 bits isn't the relevant number here. The selection ratios can get close to 2. Computer scientists don't do it the way R does, for a reason. Regar

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Philip B. Stark
That depends on the number of replications, among other things. Moreover, because of the bias, the usual formulae for uncertainty in estimates based on random samples, etc., are incorrect: sample() does not give a simple random sample. On Wed, Sep 19, 2018 at 9:15 AM Duncan Murdoch wrote: > On

Re: [Rd] A different error in sample()

2018-09-19 Thread Joris Meys
I believe the word "truncated" is causing the confusion. 3 is "the next smallest integer" following 2.5. But it is not the truncation done by trunc(). Rewording to "rounding the next smallest integer" would get rid of that confusion imho. Cheers Joris On Wed, Sep 19, 2018 at 7:57 PM Duncan Murdoc

[Rd] A different error in sample()

2018-09-19 Thread Duncan Murdoch
This may be a doc error or a coding bug. The help page for sample says: "Non-integer positive numerical values of n or x will be truncated to the next smallest integer, which has to be no larger than .Machine$integer.max." This is not true: > table(sample(2.5, 100, replace = TRUE))

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 12:23 PM, Philip B. Stark wrote: No, the 2nd call only happens when m > 2**31. Here's the code: Yes, you're right. Sorry! So the ratio really does come close to 2. However, the difference in probabilities between outcomes is still at most 2^-32 when m is less than that cutoff.

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 12:09 PM, Philip B. Stark wrote: The 53 bits only encode at most 2^{32} possible values, because the source of the float is the output of a 32-bit PRNG (the obsolete version of MT). 53 bits isn't the relevant number here. No, two calls to unif_rand() are used. There are two 32 b

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 9:40 AM, David Hugh-Jones wrote: On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch > wrote: I think the analyses are correct, but I doubt if a change to the default is likely to be accepted as it would make it more difficult to reprod

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 19/09/2018 9:09 AM, Iñaki Ucar wrote: El mié., 19 sept. 2018 a las 14:43, Duncan Murdoch () escribió: On 18/09/2018 5:46 PM, Carl Boettiger wrote: Dear list, It looks to me that R samples random integers using an intuitive but biased algorithm by going from a random number on [0,1) from th

Re: [Rd] R-admin typo

2018-09-19 Thread Tomas Kalibera
Thanks, Tomas On 09/19/2018 06:15 AM, Colin Gillespie wrote: Hi, Section 3.2 of the R-admin manual https://cran.ma.imperial.ac.uk/doc/manuals/r-release/R-admin.html#Testing-a-Windows-Installation could be improved. The particular sentence is The Rtools are not needed to run these tests. but

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Ben Bolker
On 2018-09-19 09:40 AM, David Hugh-Jones wrote: > On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch > wrote: > >> >> I think the analyses are correct, but I doubt if a change to the default >> is likely to be accepted as it would make it more difficult to reproduce >> older results. > > > I'm a b

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread David Hugh-Jones
On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch wrote: > > I think the analyses are correct, but I doubt if a change to the default > is likely to be accepted as it would make it more difficult to reproduce > older results. I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a s

[Rd] R-admin typo

2018-09-19 Thread Colin Gillespie
Hi, Section 3.2 of the R-admin manual https://cran.ma.imperial.ac.uk/doc/manuals/r-release/R-admin.html#Testing-a-Windows-Installation could be improved. The particular sentence is The Rtools are not needed to run these tests. but more comprehensive analysis of errors will be given if diff is i

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Iñaki Ucar
El mié., 19 sept. 2018 a las 14:43, Duncan Murdoch () escribió: > > On 18/09/2018 5:46 PM, Carl Boettiger wrote: > > Dear list, > > > > It looks to me that R samples random integers using an intuitive but biased > > algorithm by going from a random number on [0,1) from the PRNG to a random > > inte

Re: [Rd] Bias in R's random integers?

2018-09-19 Thread Duncan Murdoch
On 18/09/2018 5:46 PM, Carl Boettiger wrote: Dear list, It looks to me that R samples random integers using an intuitive but biased algorithm by going from a random number on [0,1) from the PRNG to a random integer, e.g. https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808 Many

[Rd] Bias in R's random integers?

2018-09-19 Thread Carl Boettiger
Dear list, It looks to me that R samples random integers using an intuitive but biased algorithm by going from a random number on [0,1) from the PRNG to a random integer, e.g. https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808 Many other languages use various rejection sampling