Comments below.
On 2014-03-06 11:17, Henric Winell wrote:
Hi,
I've implemented parallelization in one of my packages using the
'parallel' package -- many thanks for providing it!
In my package I'm importing 'parallel' and so added it to the
DESCRIPTION file's 'Import:' tag and also added a
'importFrom("parallel", ...)' statement in the NAMESPACE file.
Parallelization works nicely, but my package no longer passes any parts
of its (unparallelized) checks that depends on random number generation,
e.g., the simulated data in the check suite are no longer the same as
before parallelization was added. This seems to be due to 'parallel'
changing '.Random.seed' when loading its name space:
> set.seed(1)
> rs1 <- .Random.seed
> rnorm(1)
[1] -0.6264538
> set.seed(1)
> rs2 <- .Random.seed
> identical(rs1, rs2)
[1] TRUE
> loadNamespace("parallel")
<environment: namespace:parallel>
> rs3 <- .Random.seed
> identical(rs1, rs3)
[1] FALSE
> rnorm(1)
[1] -0.3262334
> set.seed(1)
> rs4 <- .Random.seed
> identical(rs1, rs4)
[1] TRUE
I've taken a look at the 'parallel' source code, and in a few places a
call to 'runif(1)' is issued. So, what effectively seems to happen when
'parallel' is loaded is
> set.seed(1)
> runif(1)
[1] 0.2655087
> rnorm(1)
[1] -0.3262334
Some digging reveals that this is due to no port number for the socket
connection being set by default, in which case 'parallel' picks a random
port in the 11000-11999 range using 'runif(1L)'. So, by setting
R_PARALLEL_PORT the '.Random.seed' object is no longer touched:
> Sys.setenv(R_PARALLEL_PORT = 11500)
> set.seed(1)
> rs1 <- .Random.seed
> loadNamespace("parallel")
<environment: namespace:parallel>
> rs2 <- .Random.seed
> identical(rs1, rs2)
[1] TRUE
This is handled in the 'initDefaultClusterOptions' function in 'snow.R',
where line 88 has
port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300)%%1)
It seems to me that we can tread more carefully here. I've attached a
trivial patch that
1. Checks if '.Random.seed' exists
2. If TRUE: a) save '.Random.seed'
b) make the call above
c) reset '.Random.seed' to its state in a)
If FALSE: a) make the call above
b) remove '.Random.seed'
In due course I hope someone is interested enough to review it.
Henric Winell
which reproduces the above. But is this really necessary? And more
importantly (at least to me): Can it somehow be avoided?
The current state of affairs is a bit unfortunate, since it implies that
a user just by loading the new parallelized version of my package can no
longer reproduce any subsequent results depending on random number
generation (unless a call to 'set.seed' was issued *after* attaching my
package).
I'd be most grateful for any help that you're able to provide here. Many
thanks!
Kind regards,
Henric Winell
sessionInfo()
R Under development (unstable) (2014-01-26 r64897)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=sv_SE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Index: snow.R
===================================================================
--- snow.R (revision 65125)
+++ snow.R (working copy)
@@ -84,8 +84,16 @@
rscript <- file.path(R.home("bin"), "Rscript")
port <- Sys.getenv("R_PARALLEL_PORT")
port <- if (identical(port, "random")) NA else as.integer(port)
- if (is.na(port))
- port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+ if (is.na(port)) {
+ if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) {
+ seed <- get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
+ port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+ assign(".Random.seed", seed, envir = .GlobalEnv, inherits = FALSE)
+ } else {
+ port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
+ rm(".Random.seed", seed, envir = .GlobalEnv, inherits = FALSE)
+ }
+ }
options <- list(port = as.integer(port),
timeout = 60 * 60 * 24 * 30, # 30 days
master = Sys.info()["nodename"],
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel