[Rd] Replacing the Random Number Generator in Stand Alone Library
Hi R-Developers, I had a question about the random number generator used in the R StandAlone Math Library. The stand-alone library depends on the unif_rand() function for most simulated values, and this function is provided in the sunif.c file in the relevant directory. At present, this program implements the "Marsaglia-Multicarry" algorithm, which is described throughout the R documentation as: "A multiply-with-carry RNG is used, as recommended by George Marsaglia in his post to the mailing list 'sci.stat.math'. It has a period of more than 2^60 and has passed all tests (according to Marsaglia). The seed is two integers (all values allowed)." However, I do not think this RNG actually passes all tests. For example, the Handbook of Computational Econometrics (illegal web copy at link below), shows that it fails the mtuple test and gives an explicit example where it leads to problems because it failed this test. The mtuple test was introduced by Marsaglia in 1985, and I gather he wrote his mailing list comment that it "passes all tests" sometime after this, so I am not sure what explains this distinction (though I am not sure if the mtuple test is included in the diehard tests, which he may have been what he was referring to). However, there are clearly some areas where this PRNG runs in to trouble (although the books example is better, another problem is that it can't seem to simulate a value above (1/2)^1+(1/4)^4) after simulating a value below 1e-6. Given that the Mersenne Twister seems to be the standard for simulation these days (and used as the default in R), it seems like it might be useful to change the stand alone library so it also uses this routine. I gather this would be pretty easy to do by pulling this function from the RNG.c file and moving it into the sunif.c file, and have a prototype of this. However, I thought I would ask, is there a reason this hasn't been done? Or is it just a historical carry-over (pun intended I suppose). Warm wishes, Nigel Research Fellow Massachusetts General Hospital / Broad Institute Book link: http://thenigerianprofessionalaccountant.files.wordpress.com/2013/04/handboo k-of-computational-econometrics-belsley.pdf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FW: Replacing the Random Number Generator in Stand Alone Library
Okay, so I am guessing everyone had the same response I initially did when hearing that this RNG might not be so hot*... as an alternate question, if I submitted a patch to replace the current RNG with the twister, would it be accepted? -N -Original Message- From: Nigel Delaney [mailto:nigel.dela...@outlook.com] Sent: Thursday, October 10, 2013 5:04 PM To: r-devel@r-project.org Subject: Replacing the Random Number Generator in Stand Alone Library Hi R-Developers, I had a question about the random number generator used in the R StandAlone Math Library. The stand-alone library depends on the unif_rand() function for most simulated values, and this function is provided in the sunif.c file in the relevant directory. At present, this program implements the "Marsaglia-Multicarry" algorithm, which is described throughout the R documentation as: "A multiply-with-carry RNG is used, as recommended by George Marsaglia in his post to the mailing list 'sci.stat.math'. It has a period of more than 2^60 and has passed all tests (according to Marsaglia). The seed is two integers (all values allowed)." However, I do not think this RNG actually passes all tests. For example, the Handbook of Computational Econometrics (illegal web copy at link below), shows that it fails the mtuple test and gives an explicit example where it leads to problems because it failed this test. The mtuple test was introduced by Marsaglia in 1985, and I gather he wrote his mailing list comment that it "passes all tests" sometime after this, so I am not sure what explains this distinction (though I am not sure if the mtuple test is included in the diehard tests, which he may have been what he was referring to). However, there are clearly some areas where this PRNG runs in to trouble (although the books example is better, another problem is that it can't seem to simulate a value above (1/2)^1+(1/4)^4) after simulating a value below 1e-6. Given that the Mersenne Twister seems to be the standard for simulation these days (and used as the default in R), it seems like it might be useful to change the stand alone library so it also uses this routine. I gather this would be pretty easy to do by pulling this function from the RNG.c file and moving it into the sunif.c file, and have a prototype of this. However, I thought I would ask, is there a reason this hasn't been done? Or is it just a historical carry-over (pun intended I suppose). Warm wishes, Nigel Research Fellow Massachusetts General Hospital / Broad Institute Book link: http://thenigerianprofessionalaccountant.files.wordpress.com/2013/04/handboo k-of-computational-econometrics-belsley.pdf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Embedded R Fails When Run on LSF Queue System
Hi, I had a quick question I was hoping someone might be able to answer, as my journey through the source code so far has not been very profitable. I have a command line program that embeds R, and the program works just fine when run from the command line. However, when I run the program as a job in the LSF cluster at my institute, the following error occurs: Fatal error: you must specify '--save', '--no-save' or '--vanilla' This is confusing becasue I believe I have set this by changing the startup parameters so that the no-save option is in use. I initialize R with the following collection of function calls: 1-Rf_initialize_R 2-R_SetParams 3-setup_Rmainloop At step 2 I pass in a structure that specifies the no-save option (as well as interactive=true, verbose=false), so I would expect it to be okay, but I still receive this error. Again, not at the terminal but only when run as a separate process. I gathered from this thread that someone else has experienced this (https://stat.ethz.ch/pipermail/r-help/2008-March/155935.html), but am not sure how best to rectify it as I believe I have already changed the setting. Does anyone know what a good solution or work around might be? Thanks for any help, Nigel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Embedded R Fails When Run on LSF Queue System
Thanks Dirk. I'll confess I am mostly trying to follow the practices outlined in the writing R-extensions manual and books by Chambers and Gentleman, but have found some conflicting advice on the internet in various places, and am also trying to build off a previously existing code base (that at least worked on some operating-systems/architectures/scenarios, but seems to break when I move the program around to any new type of environment). Will definitely re-write that bit. Sincere thanks again. Out of curiosity, can I ask one more question since I gather you have much more expertise than myself? The extension manual has a somewhat obscure, to me at least, section on threading issues (8.1.5). My understanding is that because the R_CStackStart is a single value, if R is called from multiple threads (even if never simultaneously), it is essentially best to just disable the stack check rather than try to re-write this value each time, which I gather might need to be done (and simply increasing the stack size of the calling thread will not help). Is this the approach you have taken in your projects? Or do you think this is also a bad idea? -Original Message- From: Dirk Eddelbuettel [mailto:e...@debian.org] Sent: Sunday, November 10, 2013 8:50 PM To: Nigel Delaney Cc: r-devel@r-project.org Subject: Re: [Rd] Embedded R Fails When Run on LSF Queue System On 10 November 2013 at 16:32, Nigel Delaney wrote: | I had a quick question I was hoping someone might be able to answer, as my journey through the source code so far has not been very profitable. I have a command line program that embeds R, and the program works just fine when run from the command line. However, when I run the program as a job in the LSF cluster at my institute, the following error occurs: | | Fatal error: you must specify '--save', '--no-save' or '--vanilla' | | This is confusing becasue I believe I have set this by changing the startup parameters so that the no-save option is in use. I initialize R with the following collection of function calls: | | 1-Rf_initialize_R | 2-R_SetParams | 3-setup_Rmainloop | | At step 2 I pass in a structure that specifies the no-save option (as well as interactive=true, verbose=false), so I would expect it to be okay, but I still receive this error. Again, not at the terminal but only when run as a separate process. | | I gathered from this thread that someone else has experienced this (https://stat.ethz.ch/pipermail/r-help/2008-March/155935.html), but am not sure how best to rectify it as I believe I have already changed the setting. | | Does anyone know what a good solution or work around might be? As a first approximation, when the manual tells you do something a certain way, you are in fact better off doing it that way. I stand behind two projects embedding R: littler (with Jeff Horner), and RInside (with Romain Francois). Both do this explicitly. From littler.c: char *R_argv[] = {(char*)programName, "--gui=none", "--no-restore", "--no-save", "--no-readline", "--silent", "", ""}; char *R_argv_opt[] = {"--vanilla", "--slave"}; int R_argc = (sizeof(R_argv) - sizeof(R_argv_opt) ) / sizeof(R_argv[0]); [...] /* some logic to add R_argv_opt parts to R_argv omitted (/ [...] Rf_initEmbeddedR(R_argc, R_argv); /* Initialize the embedded R interpreter */ Hope this helps, Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel