On 9/20/18 1:43 AM, Carl Boettiger wrote: > For a well-tested C algorithm, based on my reading of Lemire, the unbiased > "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C > standard library in OpenBSD and macOS (as arc4random_uniform), and in the > GNU standard library. Lemire also provides C++ code in the appendix of his > piece for both this and the faster "nearly divisionless" algorithm. > > It would be excellent if any R core members were interested in considering > bindings to these algorithms as a patch, or might express expectations for > how that patch would have to operate (e.g. re Duncan's comment about > non-integer arguments to sample size). Otherwise, an R package binding > seems like a good starting point, but I'm not the right volunteer. It is difficult to do this in a package, since R does not provide access to the random bits generated by the RNG. Only a float in (0,1) is available via unif_rand(). However, if one is willing to use an external RNG, it is of course possible. After reading about Lemire's work [1], I had planned to integrate such an unbiased sampling scheme into the dqrng package, which I have now started. [2]
Using Duncan's example, the results look much better: > library(dqrng) > m <- (2/5)*2^32 > y <- dqsample(m, 1000000, replace = TRUE) > table(y %% 2) 0 1 500252 499748 Currently I am taking the other interpretation of "truncated": > table(dqsample(2.5, 1000000, replace = TRUE)) 0 1 499894 500106 I will adjust this to whatever is decided for base R. However, there is currently neither long vector nor weighted sampling support. And the performance without replacement is quite bad compared to R's algorithm with hashing. cheerio ralf [1] via http://www.pcg-random.org/posts/bounded-rands.html [2] https://github.com/daqana/dqrng/tree/feature/sample -- Ralf Stubner Senior Software Engineer / Trainer daqana GmbH Dortustraße 48 14467 Potsdam T: +49 331 23 61 93 11 F: +49 331 23 61 93 90 M: +49 162 20 91 196 Mail: ralf.stub...@daqana.com Sitz: Potsdam Register: AG Potsdam HRB 27966 P Ust.-IdNr.: DE300072622 Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel