[Rd] OpenMP and random number generation

2012-02-22 Thread Mathieu Ribatet
Dear all,

Now that R has OpenMP facilities, I'm trying to use it for my own package but 
I'm still wondering if it is safe to use random number generation within a 
OpenMP block. I looked at the R writing extension document  both on the OpenMP 
and Random number generation but didn't find any information about that.

Could someone tell me if it is safe or not please ?

Best,
Mathieu

- 
I3M, UMR CNRS 5149
Universite Montpellier II,
4 place Eugene Bataillon
34095 Montpellier cedex 5   France
http://www.math.univ-montp2.fr/~ribatet
Tel: + 33 (0)4 67 14 41 98

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] OpenMP and random number generation

2012-02-22 Thread Prof Brian Ripley

On 22/02/2012 08:23, Mathieu Ribatet wrote:

Dear all,

Now that R has OpenMP facilities,


Hmm, not exactly new.


I'm trying to use it for my own package but I'm still wondering if it
is safe to use random number generation within a OpenMP block. I looked 
at the R writing extension document  both on the OpenMP and Random 
number generation but didn't find any information about that.


Could someone tell me if it is safe or not please ?


Using threads in compiled code called from R (e.g. pthreads or OpenMP) 
is 'for experts only': we do not document which parts of R are or are 
not thread-safe except via making the sources available.  If you need to 
ask then it is not safe for you, and in particular it would be unsafe to 
call Get/PutRNGstate.



Best,
Mathieu

-
I3M, UMR CNRS 5149
Universite Montpellier II,
4 place Eugene Bataillon
34095 Montpellier cedex 5   France
http://www.math.univ-montp2.fr/~ribatet
Tel: + 33 (0)4 67 14 41 98


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] OpenMP and random number generation

2012-02-22 Thread Karl Forner
Hello,

For your information, I plan to release "soon" a package with a fast and
multithreaded aware RNG for C++ code in R packages.
It is currently part of one of my (not yet accepted) packages and I want to
extract it into its own package.
I plan to do some quick benchmarks too.

Of course I can not define exactly when it will be ready.

Best,
Karl

On Wed, Feb 22, 2012 at 9:23 AM, Mathieu Ribatet <
mathieu.riba...@math.univ-montp2.fr> wrote:

> Dear all,
>
> Now that R has OpenMP facilities, I'm trying to use it for my own package
> but I'm still wondering if it is safe to use random number generation
> within a OpenMP block. I looked at the R writing extension document  both
> on the OpenMP and Random number generation but didn't find any information
> about that.
>
> Could someone tell me if it is safe or not please ?
>
> Best,
> Mathieu
>
> -
> I3M, UMR CNRS 5149
> Universite Montpellier II,
> 4 place Eugene Bataillon
> 34095 Montpellier cedex 5   France
> http://www.math.univ-montp2.fr/~ribatet
> Tel: + 33 (0)4 67 14 41 98
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Jazzing up the Task Views index page

2012-02-22 Thread Brett Presnell

Regardless of the page's other merits (looks nice to me), I did enjoy
seeing my favorite teacher's (Dev Basu's) elephant in the Bayesian box.
Thanks for that.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] OpenMP and random number generation

2012-02-22 Thread Petr Savicky
Hi.

> Now that R has OpenMP facilities, I'm trying to use it for my own package but 
> I'm still wondering if it is safe to use random number generation within a 
> OpenMP block. I looked at the R writing extension document  both on the 
> OpenMP and Random number generation but didn't find any information about 
> that.

Package CORElearn (i am a coauthor) uses random number generation
under OpenMP in C++. This requires to have a separate copy of the
generator with its own memory for each thread.

Do you want to use it in C or C++?

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Jazzing up the Task Views index page

2012-02-22 Thread Joshua Wiley
Hi,

In a way, a simple text based system is clean and is consistent with
the plain text requirements of the listservs, and interacting with R
through code/text.  However, for people with different backgrounds, it
can seem unappealing.  I definitely believe your page is more
inviting.

I work at a campus statistical consulting center and we are seeing a
substantial increase in researchers and grad students using R.  We
only recently began supporting R, and one thing I have noticed in both
clients and other consultants relatively unfamiliar with R is that
they often seem to not look beyond the core.  In my opinion, this is
partly responsible for the belief that R is difficult to use or that
task X is easier to do in program Y than in R.  I believe that better
marketing and navigability of the task views (which themselves are
excellent lists of packages relevant to particular disciplines or
applications) can help this perception.

Barry, is this a test/example only or would you plan on keeping
something like that on your site even if it is not adopted for cran
task views?  If it is not adopted elsewhere and you are willing to
maintain it, I would like to link to it.

Very nice work!

Cheers,

Josh

On Tue, Feb 21, 2012 at 4:58 AM, Barry Rowlingson
 wrote:
> A little while ago here we had a short discussion about Task Views - I
> think ignited by someone saying 'how many times do I have to say "have
> you read the Optimisation Task View?"?' and I poured some fuel on that
> fire by saying "Task Views" was a stupid name.
>
> Anyway, I did say that Task Views were rather brilliant, but were let
> down by their hidden position on the R web sites (tucked away as the
> third element of a sub-menu of a CRAN mirror site linked to by the
> CRAN link from the Download menu on the main R home page). The index
> page is rather plain, so I designed a more engaging one. The result of
> my effort is now here:
>
> http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
>
> Having done that, I even considered that something like that could
> replace the R Homepage. Giving new visitors an idea of the vast range
> of techniques and application areas available in R would seem to be
> better than the current graphic which has been there since perhaps
> 2004.
>
> Comments, thoughts, flames, etc?
>
> Barry
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] portable parallel seeds project: request for critiques

2012-02-22 Thread Paul Johnson
Greetings. Answers below.

On Tue, Feb 21, 2012 at 7:04 AM, Petr Savicky  wrote:

>
> Hi.
>
> In the description of your project in the file
>
>  http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/README
>
> you argue as follows
>
>  Question: Why is this better than the simple old approach of
>  setting the seeds within each run with a formula like
>
>  set.seed(2345 + 10 * run)
>
>  Answer: That does allow replication, but it does not assure
>  that each run uses non-overlapping random number streams. It
>  offers absolutely no assurance whatsoever that the runs are
>  actually non-redundant.
>
> The following demonstrates that the function set.seed() for
> the default generator indeed allows to have correlated streams.
>
>  step <- function(x)
>  {
>      x[x < 0] <- x[x < 0] + 2^32
>      x <- (69069 * x + 1) %% 2^32
>      x[x > 2^31] <- x[x > 2^31] - 2^32
>      x
>  }
>
>  n <- 1000
>  seed1 <- 124370417 # any seed
>  seed2 <- step(seed1)
>
>  set.seed(seed1)
>  x <- runif(n)
>  set.seed(seed2)
>  y <- runif(n)
>
>  rbind(seed1, seed2)
>  table(x[-1] == y[-n])
>
> The output is
>
>             [,1]
>  seed1 124370417
>  seed2 205739774
>
>  FALSE  TRUE
>      5   994
>
> This means that if the streams x, y are generated from the two
> seeds above, then y is almost exactly equal to x shifted by 1.
>
> What is the current state of your project?
>
> Petr Savicky.


Thanks for the example. That is exactly the point I've been making
about the old, ordinary way of setting seeds.

I continue testing on my plan, I believe it does work. The example
implementation is successful.  I have not received any critiques that
make me think it is theoretically wrong, but I also have not received
feedback from any body who has very detailed knowledge of R's parallel
package to tell me that my approach is good.

In order for this to be easy for users, I need to put the init streams
and set current stream functions into a package, and then streamline
the process of creating the seed array.  My opinion is that CRAN is
now overflowed with too many "one function" packages, I don't want to
create another one just for these two little functions, and I may roll
it into my general purpose regression package called "rockchalk".

If I worked in a company and had money riding on this, I'd need to
hire someone from R Core to stare at the way I'm setting, saving, and
re-setting the .Random.seed variable to be completely sure there are
not complications.  Because parallel package is so new, there is
relatively little documentation on how it is supposed to work. I'm
just gazing at the source code and trying not to  break anything it
does, and trying to stop it from breaking anything I do.

One technical issue that has been raised to me is that R parallel's
implementation of the L'Ecuyer generator is based on integer valued
variables, whereas the original L'Ecuyer code uses double real
variables.  But I'm trusting the R Core on this, if they code the
generator in a way that is good enough for R itself, it is good enough
for me. (And if that's wrong, we'll all find out together :) ).

Josef Leydold (the rstream package author) has argued that R's
implementation runs more slowly than it ought to. We had some
correspondence and I tracked a few threads in forums. It appears the
approach suggested there is roadblocked by some characteristics deep
down in R and the way random streams are managed.  Packages have only
a limited, tenuous method to replace R's generators with their own
generators. Here is the comment  that L'Ecuyer and Leydold make in the
document that is distributed with rstream..  "rstream: Streams of
Random Numbers for Stochastic Simulation"

"There is one drawback in the implementation. R uses a static table to
implement access to different
kinds of random number generators. Only one slot
(RNGkind(kind="user-supplied")) has been provided
for users who want to add her own generator. This slot relies on a
pointer to a function
called user_unif_rand. Thus rstream has to abuse this slot to add
additional random number
generators. However, there are other packages on CRAN that use the
same trick (randaes, rlecuyer,
rsprng, SuppDists) or load one of these packages (Rmpi, rpvm, rpart,
snow, varSelRF). Thus if a
user loads two of these packages or adds her own uniform random number
generator, the behavior
of at least one of the packages is broken. This problem could be fixed
by some changes in the R
core system, either by adding a package variableto
RNGkind(kind="user-supplied") similar to the
.Call function, or by replacing the static table by a dynamic one.

I interpret all of this to mean that, although other approaches may
work as well or better,
the only approach that can "defend itself " in the ecology of a
running R program is an approach
that fiddles with R's own generators, rather than tries to replace
them. And that's what I'm doing.

pj

ps

Just this moment, while googling for Leydold's comments,

Re: [Rd] Private fields in reference classes (was: Private Variables in R5-Classes possible?)

2012-02-22 Thread John Chambers
(Can we please follow the terminology in the ?ReferenceClasses 
documentation.  Creativity is fine but here we need communication)


This has come up (once) before off-list.  It could be made part of 
reference classes via a general rewriting that would improve efficiency 
as well.  Not imminent, but it would be good to know of other 
applications, including why they are useful.


Meanwhile, it could be implemented for a subclass of reference classes, 
by adding a special field that lists the private (or public, if you 
like) fields, and then implementing  S4 methods for "$"  and "$<-" that 
check there for disallowed/allowed names.  As presumably one wants, this 
leaves  methods in the class free to use the private fields.


John

On 2/21/12 3:07 AM, Claus Jonathan Fritzemeier wrote:

Hi list,

is there a way to define some kind of private Variable?

I would like to prevent the user from manipulating fields on his own, 
in order to not destroy data structures.


The problem is, that as soon as the variable exists in the environment 
it is accessible via t$secret_value.


test <- setRefClass("test", fields=list(
secret = function(value){
cat("the function was called\n")
if(missing(value)){
if(exists("secret_value", envir=.self@.xData)){
return(get("secret_value", envir=.self@.xData, inherits = F))
}
else{
return(NULL)
}
}
assign("secret_value", value=value, , envir=.self@.xData)
}
) )

> t <- test$new()
> t$secret
the function was called
NULL
> t$secret_value
Error in envRefInferField(x, what, getClass(class(x)), selfEnv) :
"secret_value" is not a valid field or method name for reference class 
“test”

> t$secret <- "Blub"
the function was called
> t$secret_value
[1] "Blub"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] portable parallel seeds project: request for critiques

2012-02-22 Thread Petr Savicky
On Wed, Feb 22, 2012 at 12:17:25PM -0600, Paul Johnson wrote:
[...]
> In order for this to be easy for users, I need to put the init streams
> and set current stream functions into a package, and then streamline
> the process of creating the seed array.  My opinion is that CRAN is
> now overflowed with too many "one function" packages, I don't want to
> create another one just for these two little functions, and I may roll
> it into my general purpose regression package called "rockchalk".

I am also preparing a solution to the problem. One is based on AES
used for initialization of the R base Mersenne-Twister generator,
so it only replaces set.seed() function. Another solution is based
on "rlecuyer" package. I suggest to discuss the possible solutions
off-list before submitting to CRAN.

> One technical issue that has been raised to me is that R parallel's
> implementation of the L'Ecuyer generator is based on integer valued
> variables, whereas the original L'Ecuyer code uses double real
> variables.  But I'm trusting the R Core on this, if they code the
> generator in a way that is good enough for R itself, it is good enough
> for me. (And if that's wrong, we'll all find out together :) ).

I do not know about any L'Ecuyer's generator in R base. You probably
mean the authors of the extension packages with these generators.

> Josef Leydold (the rstream package author) has argued that R's
> implementation runs more slowly than it ought to. We had some
> correspondence and I tracked a few threads in forums. It appears the
> approach suggested there is roadblocked by some characteristics deep
> down in R and the way random streams are managed.  Packages have only
> a limited, tenuous method to replace R's generators with their own
> generators.

In order to connect a user defined generator to R, there are two
obligatory entry points "user_unif_rand" and "user_unif_init".
The first allows to call the generator from runif() and the similar
functions. The second connects the generator to set.seed() function.
If there is only one extension package with a generator loaded
to an R session, then these entry points are good enough. If the
package provides several generators, like "randtoolbox", it is
possible to change between them easily using functions provided
by the package for this purpose. I think that having several
packages with generators simultaneously can be good for their
development, but this is not needed for their use in applications.

There are also two other entry points "user_unif_nseed" and
"user_unif_seedloc", which allow to support the variable ".Random.seed".
A problem with this is that R converts the internal state of the
generator to ".Random.seed" by reading a specific memory location,
but does not alert the package about this event. So, if the state
requires a transformation to integer before storing to ".Random.seed",
it is not possible to do this only when needed.

In the package "rngwell19937", i included some code that tries to
determine, whether the user changed ".Random.seed" or not. The reason
is that most of the state is integer and is stored to ".Random.seed",
but the state contains also a function pointer, which is not stored.
It can be recomputed from ".Random.seed" and this recomputing is done,
if the package detects a change of ".Random.seed". This is not a nice
solution. So in "randtoolbox" we decided not to support ".Random.seed".

I understand that in the area of parallel computing, the work
with ".Random.seed" is a good paradigm, but if the generator
provides other tools for converting the state to an R object
and put it back to the active state, then ".Random.seed" is not
strictly necessary.

> Parallel Random Number Generation in C++ and R Using RngStream
> Andrew Karl · Randy Eubank · Dennis Young
> http://math.la.asu.edu/~eubank/webpage/rngStreamPaper.pdf

Thank you very much for this link.

All the best, Petr.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silently loading and Depends: versus NAMESPACE imports

2012-02-22 Thread Suraj Gupta
Dirk - I'm having the same issue.  Could you provide the details of your
solution?

On Sat, Jan 28, 2012 at 11:15 AM, Dirk Eddelbuettel  wrote:

>
> On 28 January 2012 at 16:52, Uwe Ligges wrote:
> |
> |
> | On 27.01.2012 15:57, Dirk Eddelbuettel wrote:
> | >
> | > On 12 January 2012 at 12:12, Hervé Pagès wrote:
> | > | Hi Dirk,
> | > |
> | > | On 01/11/2012 11:42 AM, Dirk Eddelbuettel wrote:
> | > |>
> | > |>  R CMD check really hates it when my .onLoad() function contains
> | > |>   suppressMessages(library(foo))
> | > |
> | > | Note that you can always fool 'R CMD check' by doing something like:
> | > |
> | > |  sillyname<- library
> | > |  suppressMessages(sillyname("foo"))
> | > |
> | > | Also isn't suppressPackageStartupMessages() more appropriate?
> | > |
> | > |>
> | > |>  However, _and for non-public packages not going to CRAN_ I prefer
> doing this
> | > |>  over using explicit Depends or import statements in the NAMESPACE
> file as the
> | > |>  latter do not give me an ability to make the loading less verbose.
>  With the
> | > |>  R universe of packages being as vast as at is, a simple
> (non-public) package
> | > |>  I have loads about five or six other packages explicitly, each of
> which loads
> | > |>  even more.  The net result is totally intimidating _sixty lines
> full_ of
> | > |>  verbose noise that is meaningful to me as an R programmer, but not
> for the
> | > |>  colleagues expected to use the packages. It looks rather
> uninviting, frankly.
> | > |>
> | > |>  How do I use imports via NAMESPACE, and yet keep the noise level
> down to zero?
> | > |
> | > | If you only need to import foo (i.e. and actually don't need to
> attach
> | > | it to the search path) then putting foo in Imports and using import
> | > | statements in NAMESPACE will keep the noise level down to zero.
> | >
> | > I don't think so.
> | >
> | > I have an internal package, call it fooUtils, that (among other
> things) needs
> | > to figure at startup whether it runs on this or that OS.
> | >
> | > So that package fooUtils does
> | >
> | >  .onLoad<- function(libname, pkgname) {
> | >
> | >  if (.Platform$OS.type == "windows") {
> | >  packageStartupMessage("Running on Windows")
> | > # [... more stuff here ... ]
> | >  } else if (.Platform$OS.type == "unix") {
> | >  packageStartupMessage("Running on Linux")
> | > # [... more stuff here ... ]
> | >  } else {
> | >  warning("Platform ", .Platform$OS.type, " not recognised")
> | >  drives<- NULL
> | >  }
> | >
> | >  # 
> | >
> | >  }
> |
> | Are you sure you want the messages in .onLoad rather than .onAttach?
>
> Thanks Uwe -- looks like that was exactly the hint I needed.
>
> By splitting the task across onLoad and onAttach I seem to be able to get
> want I need even if the package is "tickled" via NAMESPACE's importFrom.
>
> Dirk
>
> | See ?.onLoad and its "Good practice" section:
> |
> | "Loading a namespace should where possible be silent, with startup
> | messages given by .onAttach. These messages (and any essential ones from
> | .onLoad) should use packageStartupMessage so they can be silenced where
> | they would be a distraction."
> |
> | Best,
> | Uwe
> |
> |
> |
> | >
> | > and contrary to your claim, this is not silent as soon as I do
> | >
> | >
> | > importFrom(fooUtils, someThing)
> | >
> | >
> | > the messages above pop up. While I can suppress them for 'normal'
> loads via
> | >
> | > suppressMessages(library(fooUtils))
> | >
> | > or
> | >
> | > suppressPackageStartupMessages(library(fooUtils))
> | >
> | >
> | > I cannot suppress them via NAMESPACE imports.
> |  >
> | > Dirk
> | >
> | > | So I guess your question is: how do we suppress package startup
> messages
> | > | for packages listed in Depends?
> | > |
> | > | Cheers,
> | > | H.
> | > |
> | > |>
> | > |>  Dirk
> | > |>
> | > |
> | > |
> | > | --
> | > | Hervé Pagès
> | > |
> | > | Program in Computational Biology
> | > | Division of Public Health Sciences
> | > | Fred Hutchinson Cancer Research Center
> | > | 1100 Fairview Ave. N, M1-B514
> | > | P.O. Box 19024
> | > | Seattle, WA 98109-1024
> | > |
> | > | E-mail: hpa...@fhcrc.org
> | > | Phone:  (206) 667-5791
> | > | Fax:(206) 667-1319
> | >
>
> --
> "Outside of a dog, a book is a man's best friend. Inside of a dog, it is
> too
> dark to read." -- Groucho Marx
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel