[Rd] OpenMP and random number generation
Dear all, Now that R has OpenMP facilities, I'm trying to use it for my own package but I'm still wondering if it is safe to use random number generation within a OpenMP block. I looked at the R writing extension document both on the OpenMP and Random number generation but didn't find any information about that. Could someone tell me if it is safe or not please ? Best, Mathieu - I3M, UMR CNRS 5149 Universite Montpellier II, 4 place Eugene Bataillon 34095 Montpellier cedex 5 France http://www.math.univ-montp2.fr/~ribatet Tel: + 33 (0)4 67 14 41 98 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] OpenMP and random number generation
On 22/02/2012 08:23, Mathieu Ribatet wrote: Dear all, Now that R has OpenMP facilities, Hmm, not exactly new. I'm trying to use it for my own package but I'm still wondering if it is safe to use random number generation within a OpenMP block. I looked at the R writing extension document both on the OpenMP and Random number generation but didn't find any information about that. Could someone tell me if it is safe or not please ? Using threads in compiled code called from R (e.g. pthreads or OpenMP) is 'for experts only': we do not document which parts of R are or are not thread-safe except via making the sources available. If you need to ask then it is not safe for you, and in particular it would be unsafe to call Get/PutRNGstate. Best, Mathieu - I3M, UMR CNRS 5149 Universite Montpellier II, 4 place Eugene Bataillon 34095 Montpellier cedex 5 France http://www.math.univ-montp2.fr/~ribatet Tel: + 33 (0)4 67 14 41 98 -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] OpenMP and random number generation
Hello, For your information, I plan to release "soon" a package with a fast and multithreaded aware RNG for C++ code in R packages. It is currently part of one of my (not yet accepted) packages and I want to extract it into its own package. I plan to do some quick benchmarks too. Of course I can not define exactly when it will be ready. Best, Karl On Wed, Feb 22, 2012 at 9:23 AM, Mathieu Ribatet < mathieu.riba...@math.univ-montp2.fr> wrote: > Dear all, > > Now that R has OpenMP facilities, I'm trying to use it for my own package > but I'm still wondering if it is safe to use random number generation > within a OpenMP block. I looked at the R writing extension document both > on the OpenMP and Random number generation but didn't find any information > about that. > > Could someone tell me if it is safe or not please ? > > Best, > Mathieu > > - > I3M, UMR CNRS 5149 > Universite Montpellier II, > 4 place Eugene Bataillon > 34095 Montpellier cedex 5 France > http://www.math.univ-montp2.fr/~ribatet > Tel: + 33 (0)4 67 14 41 98 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Jazzing up the Task Views index page
Regardless of the page's other merits (looks nice to me), I did enjoy seeing my favorite teacher's (Dev Basu's) elephant in the Bayesian box. Thanks for that. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] OpenMP and random number generation
Hi. > Now that R has OpenMP facilities, I'm trying to use it for my own package but > I'm still wondering if it is safe to use random number generation within a > OpenMP block. I looked at the R writing extension document both on the > OpenMP and Random number generation but didn't find any information about > that. Package CORElearn (i am a coauthor) uses random number generation under OpenMP in C++. This requires to have a separate copy of the generator with its own memory for each thread. Do you want to use it in C or C++? Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Jazzing up the Task Views index page
Hi, In a way, a simple text based system is clean and is consistent with the plain text requirements of the listservs, and interacting with R through code/text. However, for people with different backgrounds, it can seem unappealing. I definitely believe your page is more inviting. I work at a campus statistical consulting center and we are seeing a substantial increase in researchers and grad students using R. We only recently began supporting R, and one thing I have noticed in both clients and other consultants relatively unfamiliar with R is that they often seem to not look beyond the core. In my opinion, this is partly responsible for the belief that R is difficult to use or that task X is easier to do in program Y than in R. I believe that better marketing and navigability of the task views (which themselves are excellent lists of packages relevant to particular disciplines or applications) can help this perception. Barry, is this a test/example only or would you plan on keeping something like that on your site even if it is not adopted for cran task views? If it is not adopted elsewhere and you are willing to maintain it, I would like to link to it. Very nice work! Cheers, Josh On Tue, Feb 21, 2012 at 4:58 AM, Barry Rowlingson wrote: > A little while ago here we had a short discussion about Task Views - I > think ignited by someone saying 'how many times do I have to say "have > you read the Optimisation Task View?"?' and I poured some fuel on that > fire by saying "Task Views" was a stupid name. > > Anyway, I did say that Task Views were rather brilliant, but were let > down by their hidden position on the R web sites (tucked away as the > third element of a sub-menu of a CRAN mirror site linked to by the > CRAN link from the Download menu on the main R home page). The index > page is rather plain, so I designed a more engaging one. The result of > my effort is now here: > > http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/ > > Having done that, I even considered that something like that could > replace the R Homepage. Giving new visitors an idea of the vast range > of techniques and application areas available in R would seem to be > better than the current graphic which has been there since perhaps > 2004. > > Comments, thoughts, flames, etc? > > Barry > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] portable parallel seeds project: request for critiques
Greetings. Answers below. On Tue, Feb 21, 2012 at 7:04 AM, Petr Savicky wrote: > > Hi. > > In the description of your project in the file > > http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/README > > you argue as follows > > Question: Why is this better than the simple old approach of > setting the seeds within each run with a formula like > > set.seed(2345 + 10 * run) > > Answer: That does allow replication, but it does not assure > that each run uses non-overlapping random number streams. It > offers absolutely no assurance whatsoever that the runs are > actually non-redundant. > > The following demonstrates that the function set.seed() for > the default generator indeed allows to have correlated streams. > > step <- function(x) > { > x[x < 0] <- x[x < 0] + 2^32 > x <- (69069 * x + 1) %% 2^32 > x[x > 2^31] <- x[x > 2^31] - 2^32 > x > } > > n <- 1000 > seed1 <- 124370417 # any seed > seed2 <- step(seed1) > > set.seed(seed1) > x <- runif(n) > set.seed(seed2) > y <- runif(n) > > rbind(seed1, seed2) > table(x[-1] == y[-n]) > > The output is > > [,1] > seed1 124370417 > seed2 205739774 > > FALSE TRUE > 5 994 > > This means that if the streams x, y are generated from the two > seeds above, then y is almost exactly equal to x shifted by 1. > > What is the current state of your project? > > Petr Savicky. Thanks for the example. That is exactly the point I've been making about the old, ordinary way of setting seeds. I continue testing on my plan, I believe it does work. The example implementation is successful. I have not received any critiques that make me think it is theoretically wrong, but I also have not received feedback from any body who has very detailed knowledge of R's parallel package to tell me that my approach is good. In order for this to be easy for users, I need to put the init streams and set current stream functions into a package, and then streamline the process of creating the seed array. My opinion is that CRAN is now overflowed with too many "one function" packages, I don't want to create another one just for these two little functions, and I may roll it into my general purpose regression package called "rockchalk". If I worked in a company and had money riding on this, I'd need to hire someone from R Core to stare at the way I'm setting, saving, and re-setting the .Random.seed variable to be completely sure there are not complications. Because parallel package is so new, there is relatively little documentation on how it is supposed to work. I'm just gazing at the source code and trying not to break anything it does, and trying to stop it from breaking anything I do. One technical issue that has been raised to me is that R parallel's implementation of the L'Ecuyer generator is based on integer valued variables, whereas the original L'Ecuyer code uses double real variables. But I'm trusting the R Core on this, if they code the generator in a way that is good enough for R itself, it is good enough for me. (And if that's wrong, we'll all find out together :) ). Josef Leydold (the rstream package author) has argued that R's implementation runs more slowly than it ought to. We had some correspondence and I tracked a few threads in forums. It appears the approach suggested there is roadblocked by some characteristics deep down in R and the way random streams are managed. Packages have only a limited, tenuous method to replace R's generators with their own generators. Here is the comment that L'Ecuyer and Leydold make in the document that is distributed with rstream.. "rstream: Streams of Random Numbers for Stochastic Simulation" "There is one drawback in the implementation. R uses a static table to implement access to different kinds of random number generators. Only one slot (RNGkind(kind="user-supplied")) has been provided for users who want to add her own generator. This slot relies on a pointer to a function called user_unif_rand. Thus rstream has to abuse this slot to add additional random number generators. However, there are other packages on CRAN that use the same trick (randaes, rlecuyer, rsprng, SuppDists) or load one of these packages (Rmpi, rpvm, rpart, snow, varSelRF). Thus if a user loads two of these packages or adds her own uniform random number generator, the behavior of at least one of the packages is broken. This problem could be fixed by some changes in the R core system, either by adding a package variableto RNGkind(kind="user-supplied") similar to the .Call function, or by replacing the static table by a dynamic one. I interpret all of this to mean that, although other approaches may work as well or better, the only approach that can "defend itself " in the ecology of a running R program is an approach that fiddles with R's own generators, rather than tries to replace them. And that's what I'm doing. pj ps Just this moment, while googling for Leydold's comments,
Re: [Rd] Private fields in reference classes (was: Private Variables in R5-Classes possible?)
(Can we please follow the terminology in the ?ReferenceClasses documentation. Creativity is fine but here we need communication) This has come up (once) before off-list. It could be made part of reference classes via a general rewriting that would improve efficiency as well. Not imminent, but it would be good to know of other applications, including why they are useful. Meanwhile, it could be implemented for a subclass of reference classes, by adding a special field that lists the private (or public, if you like) fields, and then implementing S4 methods for "$" and "$<-" that check there for disallowed/allowed names. As presumably one wants, this leaves methods in the class free to use the private fields. John On 2/21/12 3:07 AM, Claus Jonathan Fritzemeier wrote: Hi list, is there a way to define some kind of private Variable? I would like to prevent the user from manipulating fields on his own, in order to not destroy data structures. The problem is, that as soon as the variable exists in the environment it is accessible via t$secret_value. test <- setRefClass("test", fields=list( secret = function(value){ cat("the function was called\n") if(missing(value)){ if(exists("secret_value", envir=.self@.xData)){ return(get("secret_value", envir=.self@.xData, inherits = F)) } else{ return(NULL) } } assign("secret_value", value=value, , envir=.self@.xData) } ) ) > t <- test$new() > t$secret the function was called NULL > t$secret_value Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : "secret_value" is not a valid field or method name for reference class “test” > t$secret <- "Blub" the function was called > t$secret_value [1] "Blub" __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] portable parallel seeds project: request for critiques
On Wed, Feb 22, 2012 at 12:17:25PM -0600, Paul Johnson wrote: [...] > In order for this to be easy for users, I need to put the init streams > and set current stream functions into a package, and then streamline > the process of creating the seed array. My opinion is that CRAN is > now overflowed with too many "one function" packages, I don't want to > create another one just for these two little functions, and I may roll > it into my general purpose regression package called "rockchalk". I am also preparing a solution to the problem. One is based on AES used for initialization of the R base Mersenne-Twister generator, so it only replaces set.seed() function. Another solution is based on "rlecuyer" package. I suggest to discuss the possible solutions off-list before submitting to CRAN. > One technical issue that has been raised to me is that R parallel's > implementation of the L'Ecuyer generator is based on integer valued > variables, whereas the original L'Ecuyer code uses double real > variables. But I'm trusting the R Core on this, if they code the > generator in a way that is good enough for R itself, it is good enough > for me. (And if that's wrong, we'll all find out together :) ). I do not know about any L'Ecuyer's generator in R base. You probably mean the authors of the extension packages with these generators. > Josef Leydold (the rstream package author) has argued that R's > implementation runs more slowly than it ought to. We had some > correspondence and I tracked a few threads in forums. It appears the > approach suggested there is roadblocked by some characteristics deep > down in R and the way random streams are managed. Packages have only > a limited, tenuous method to replace R's generators with their own > generators. In order to connect a user defined generator to R, there are two obligatory entry points "user_unif_rand" and "user_unif_init". The first allows to call the generator from runif() and the similar functions. The second connects the generator to set.seed() function. If there is only one extension package with a generator loaded to an R session, then these entry points are good enough. If the package provides several generators, like "randtoolbox", it is possible to change between them easily using functions provided by the package for this purpose. I think that having several packages with generators simultaneously can be good for their development, but this is not needed for their use in applications. There are also two other entry points "user_unif_nseed" and "user_unif_seedloc", which allow to support the variable ".Random.seed". A problem with this is that R converts the internal state of the generator to ".Random.seed" by reading a specific memory location, but does not alert the package about this event. So, if the state requires a transformation to integer before storing to ".Random.seed", it is not possible to do this only when needed. In the package "rngwell19937", i included some code that tries to determine, whether the user changed ".Random.seed" or not. The reason is that most of the state is integer and is stored to ".Random.seed", but the state contains also a function pointer, which is not stored. It can be recomputed from ".Random.seed" and this recomputing is done, if the package detects a change of ".Random.seed". This is not a nice solution. So in "randtoolbox" we decided not to support ".Random.seed". I understand that in the area of parallel computing, the work with ".Random.seed" is a good paradigm, but if the generator provides other tools for converting the state to an R object and put it back to the active state, then ".Random.seed" is not strictly necessary. > Parallel Random Number Generation in C++ and R Using RngStream > Andrew Karl · Randy Eubank · Dennis Young > http://math.la.asu.edu/~eubank/webpage/rngStreamPaper.pdf Thank you very much for this link. All the best, Petr. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Silently loading and Depends: versus NAMESPACE imports
Dirk - I'm having the same issue. Could you provide the details of your solution? On Sat, Jan 28, 2012 at 11:15 AM, Dirk Eddelbuettel wrote: > > On 28 January 2012 at 16:52, Uwe Ligges wrote: > | > | > | On 27.01.2012 15:57, Dirk Eddelbuettel wrote: > | > > | > On 12 January 2012 at 12:12, Hervé Pagès wrote: > | > | Hi Dirk, > | > | > | > | On 01/11/2012 11:42 AM, Dirk Eddelbuettel wrote: > | > |> > | > |> R CMD check really hates it when my .onLoad() function contains > | > |> suppressMessages(library(foo)) > | > | > | > | Note that you can always fool 'R CMD check' by doing something like: > | > | > | > | sillyname<- library > | > | suppressMessages(sillyname("foo")) > | > | > | > | Also isn't suppressPackageStartupMessages() more appropriate? > | > | > | > |> > | > |> However, _and for non-public packages not going to CRAN_ I prefer > doing this > | > |> over using explicit Depends or import statements in the NAMESPACE > file as the > | > |> latter do not give me an ability to make the loading less verbose. > With the > | > |> R universe of packages being as vast as at is, a simple > (non-public) package > | > |> I have loads about five or six other packages explicitly, each of > which loads > | > |> even more. The net result is totally intimidating _sixty lines > full_ of > | > |> verbose noise that is meaningful to me as an R programmer, but not > for the > | > |> colleagues expected to use the packages. It looks rather > uninviting, frankly. > | > |> > | > |> How do I use imports via NAMESPACE, and yet keep the noise level > down to zero? > | > | > | > | If you only need to import foo (i.e. and actually don't need to > attach > | > | it to the search path) then putting foo in Imports and using import > | > | statements in NAMESPACE will keep the noise level down to zero. > | > > | > I don't think so. > | > > | > I have an internal package, call it fooUtils, that (among other > things) needs > | > to figure at startup whether it runs on this or that OS. > | > > | > So that package fooUtils does > | > > | > .onLoad<- function(libname, pkgname) { > | > > | > if (.Platform$OS.type == "windows") { > | > packageStartupMessage("Running on Windows") > | > # [... more stuff here ... ] > | > } else if (.Platform$OS.type == "unix") { > | > packageStartupMessage("Running on Linux") > | > # [... more stuff here ... ] > | > } else { > | > warning("Platform ", .Platform$OS.type, " not recognised") > | > drives<- NULL > | > } > | > > | > # > | > > | > } > | > | Are you sure you want the messages in .onLoad rather than .onAttach? > > Thanks Uwe -- looks like that was exactly the hint I needed. > > By splitting the task across onLoad and onAttach I seem to be able to get > want I need even if the package is "tickled" via NAMESPACE's importFrom. > > Dirk > > | See ?.onLoad and its "Good practice" section: > | > | "Loading a namespace should where possible be silent, with startup > | messages given by .onAttach. These messages (and any essential ones from > | .onLoad) should use packageStartupMessage so they can be silenced where > | they would be a distraction." > | > | Best, > | Uwe > | > | > | > | > > | > and contrary to your claim, this is not silent as soon as I do > | > > | > > | > importFrom(fooUtils, someThing) > | > > | > > | > the messages above pop up. While I can suppress them for 'normal' > loads via > | > > | > suppressMessages(library(fooUtils)) > | > > | > or > | > > | > suppressPackageStartupMessages(library(fooUtils)) > | > > | > > | > I cannot suppress them via NAMESPACE imports. > | > > | > Dirk > | > > | > | So I guess your question is: how do we suppress package startup > messages > | > | for packages listed in Depends? > | > | > | > | Cheers, > | > | H. > | > | > | > |> > | > |> Dirk > | > |> > | > | > | > | > | > | -- > | > | Hervé Pagès > | > | > | > | Program in Computational Biology > | > | Division of Public Health Sciences > | > | Fred Hutchinson Cancer Research Center > | > | 1100 Fairview Ave. N, M1-B514 > | > | P.O. Box 19024 > | > | Seattle, WA 98109-1024 > | > | > | > | E-mail: hpa...@fhcrc.org > | > | Phone: (206) 667-5791 > | > | Fax:(206) 667-1319 > | > > > -- > "Outside of a dog, a book is a man's best friend. Inside of a dog, it is > too > dark to read." -- Groucho Marx > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel