Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Philip B. Stark
The same issue occurs in walker_ProbSampleReplace() in random.c, lines 386-387: rU = unif_rand() * n; k = (int) rU; Cheers, Philip On Wed, Sep 19, 2018 at 3:08 PM Duncan Murdoch wrote: > On 19/09/2018 5:57 PM, David Hugh-Jones wrote: > > > > It doesn't seem too hard to come up with plausible w

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Steve Grubb
Hello, On Thursday, September 20, 2018 11:15:04 AM EDT Duncan Murdoch wrote: > On 20/09/2018 6:59 AM, Ralf Stubner wrote: > > On 9/20/18 1:43 AM, Carl Boettiger wrote: > >> For a well-tested C algorithm, based on my reading of Lemire, the > >> unbiased "algorithm 3" in https://arxiv.org/abs/1805.1

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Hervé Pagès
Hi, Note that it wouldn't be the first time that sample() changes behavior in a non-backward compatible way: https://stat.ethz.ch/pipermail/r-devel/2012-October/065049.html Cheers, H. On 09/20/2018 08:15 AM, Duncan Murdoch wrote: On 20/09/2018 6:59 AM, Ralf Stubner wrote: On 9/20/18 1:43

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Gabe Becker
Hi all, On Thu, Sep 20, 2018 at 9:30 AM, Paul Gilbert wrote: > > > There are only two small problems that occur to me: > > 1/ Researchers that want to have reproducible results (all I hope) need to > be aware the change has happened. In theory they should have recorded the > RNG they were using,

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Paul Gilbert
On 09/19/2018 10:03 AM, Ben Bolker wrote: ... Balancing backward compatibility and correctness is a tough problem here. I think improvements in the RNG is a situation where backward compatibility is not really going to be lost, because people can specify the old generator, they just wil

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Duncan Murdoch
On 20/09/2018 11:01 AM, Radford Neal wrote: From: Duncan Murdoch Let's try it: > m <- (2/5)*2^32 > m > 2^31 [1] FALSE > x <- sample(m, 100, replace = TRUE) > table(x %% 2) 0 1 399850 600150 Since m is an even number, the true proportions of evens and odds should be

Re: [Rd] A different error in sample()

2018-09-20 Thread Martin Maechler
> Martin Maechler > on Thu, 20 Sep 2018 09:20:46 +0200 writes: > Wolfgang Huber > on Thu, 20 Sep 2018 08:47:47 +0200 writes: >> FWIW, I suspect this is related to the function >> R_unif_index that was introduced in src/main/RNG.c around >> revision 72356, or

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Duncan Murdoch
On 20/09/2018 6:59 AM, Ralf Stubner wrote: On 9/20/18 1:43 AM, Carl Boettiger wrote: For a well-tested C algorithm, based on my reading of Lemire, the unbiased "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C standard library in OpenBSD and macOS (as arc4random_uniform)

Re: [Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-09-20 Thread Seth Russell
Thanks for the warning about fork without exec(). A co-worker of mine, also on Mac, ran the sample code and got an error about that exact problem. Thanks also for the pointer to try curl::multi_add() or download.file() with a vector of files. My actual use case includes downloading the files and

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Radford Neal
> From: Duncan Murdoch > Let's try it: > > > m <- (2/5)*2^32 > > m > 2^31 > [1] FALSE > > x <- sample(m, 100, replace = TRUE) > > table(x %% 2) > > 0 1 > 399850 600150 > > Since m is an even number, the true proportions of evens and odds should > be exactly 0.5. That's som

Re: [Rd] future time stamps warning

2018-09-20 Thread Jari Oksanen
Could this be a timezone issue (setting the timezone in local computer and communicating this to CRAN): when I look at the email in my computer I see: On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti wrote: -rwxr-xr-x lei/lei1447 2018-09-20 13:23 eurostat/DESCRIPTION Which seems to claim

Re: [Rd] future time stamps warning

2018-09-20 Thread Emil Bode
On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti wrote: > > Time stamps are correct and my system time is correct. How is your timezone set? When I look at your github I see as timestamp for DESCRIPTION today, 1:25 PM GMT+2. (and as I'm writing this, it's 1:12 PM GMT+2) GMT+2 is CEST, if

Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Ralf Stubner
On 9/20/18 1:43 AM, Carl Boettiger wrote: > For a well-tested C algorithm, based on my reading of Lemire, the unbiased > "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C > standard library in OpenBSD and macOS (as arc4random_uniform), and in the > GNU standard library. Le

Re: [Rd] future time stamps warning

2018-09-20 Thread Gábor Csárdi
On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti wrote: > > Time stamps are correct and my system time is correct. > > I am now tried to use Sys.setFileTime() to update time stamps as proposed. > This does not help. > > The windows and debian builds give different reports on the time stamp > issue. > ht

Re: [Rd] future time stamps warning

2018-09-20 Thread Leo Lahti
Time stamps are correct and my system time is correct. I am now tried to use Sys.setFileTime() to update time stamps as proposed. This does not help. The windows and debian builds give different reports on the time stamp issue. https://win-builder.r-project.org/incoming_pretest/eurostat_3.2.8_201

Re: [Rd] future time stamps warning

2018-09-20 Thread Duncan Murdoch
On 20/09/2018 6:56 AM, Leo Lahti wrote: Dear developers, Upon CRAN submission I have bumped into "future file timestamps" warning that I can't solve. I have updated the package as usual, and all checks go through in my system. CRAN reports the following warning however. * checking for future fi

Re: [Rd] future time stamps warning

2018-09-20 Thread Gábor Csárdi
Have you tried setting your system clock correctly and then Sys.setFileTime()? Gabor On Thu, Sep 20, 2018 at 10:58 AM Leo Lahti wrote: > > Dear developers, > > Upon CRAN submission I have bumped into "future file timestamps" warning > that I can't solve. I have updated the package as usual, and a

[Rd] future time stamps warning

2018-09-20 Thread Leo Lahti
Dear developers, Upon CRAN submission I have bumped into "future file timestamps" warning that I can't solve. I have updated the package as usual, and all checks go through in my system. CRAN reports the following warning however. * checking for future file timestanps ... WARNING Files with futur

Re: [Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-09-20 Thread Gábor Csárdi
This code actually happens to work for me on macOS, but I think in general you cannot rely on performing HTTP requests in fork clusters, i.e. with mclapply(). Fork clusters create worker processes by forking the R process and then _not_ executing another R binary. (Which is often convenient, becau

Re: [Rd] A different error in sample()

2018-09-20 Thread peter dalgaard
Yup, that is a bug, at least in the documentation. Probably a clearer example is x <- seq(2.001, 2.999, length.out=999) threes <- sapply(x, function(y) table(sample(y, 1, replace=TRUE))[3]) plot(threes, type="l") curve(1*(x-2)/x, add=TRUE, col="red") which is entirely consistent with wh

Re: [Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-09-20 Thread Martin Maechler
> Seth Russell > on Wed, 19 Sep 2018 15:19:48 -0600 writes: > I have an lapply function call that I want to parallelize. Below is a very > simplified version of the code: > url_base <- "https://cloud.r-project.org/src/contrib/"; > files <- c("A3_1.0.0.tar.gz", "ABC.RA

Re: [Rd] A different error in sample()

2018-09-20 Thread Joris Meys
To be more clear: I do NOT state that the function "round" is used. I read the documentation as "non integer positive numerical values will be replaced by the next smallest integer", the important part being the NEXT smallest integer, i.e. how ceiling() does it. So that's exactly what I would expec

Re: [Rd] A different error in sample()

2018-09-20 Thread Emil Bode
But do we handle it as an error in what sample does, or how the documentation is? I think what is done now would be best described as "ceilinged", i.e. what ceiling() does. But is there an English word to describe this? Or just use "converted to the next smallest integer"? But then again, what h

Re: [Rd] A different error in sample()

2018-09-20 Thread lmo via R-devel
Although it seems to be pretty weird to enter a numeric vector of length one that is not an integer as the first argument to sample(), the results do not seem to match what is documented in the manual. In addition, the results below do not support the use of round rather than truncate in the doc

[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

2018-09-20 Thread Seth Russell
I have an lapply function call that I want to parallelize. Below is a very simplified version of the code: url_base <- "https://cloud.r-project.org/src/contrib/"; files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz") res <- parallel::mclapply(files, function(s) download.file(paste0(url_base, s), s

Re: [Rd] A different error in sample()

2018-09-20 Thread Kim Seonghyun
Hi, I have not checked the source code, but I think it is because of banker's round. https://en.wikipedia.org/wiki/Rounding#Round_half_to_even Best regards, Kim -Original Message- From: R-devel On Behalf Of Dario Strbenac Sent: den 20 september 2018 03:00 To: r-devel Subject: Re: [Rd]

Re: [Rd] A different error in sample()

2018-09-20 Thread Martin Maechler
> Wolfgang Huber > on Thu, 20 Sep 2018 08:47:47 +0200 writes: > FWIW, I suspect this is related to the function > R_unif_index that was introduced in src/main/RNG.c around > revision 72356, or the way this function is used in > do_sample in src/main/random.c. Yes, it