I think I am failing to understand how boot() uses the parallel package on 
linux machines, using R 3.2.2 on three different machines with 2, 4 and 8 cores 
all results in a slow down if I use "multicore" and "ncpus".  Here's the code 
that creates a very simple reproducible example:

bootReps <- 500
seed <- 12345
set.seed(seed)
require(boot)
dat <- rnorm(500)
bootMean <- function(dat,ind) {
  mean(dat[ind])
}
start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps)
boot.ci(bootDat,type="norm")
stop.time <- proc.time()
elapsed.time1 <- stop.time - start.time
require(parallel)
set.seed(seed)
start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps,
                parallel="multicore",
                ncpus=2)
boot.ci(bootDat,type="norm")
stop.time <- proc.time()
elapsed.time2 <- stop.time - start.time
elapsed.time1
elapsed.time2

Running that on my old Dell Latitude E6500 running Debian Squeeze and using 32 
bit R 3.2.2 gives me:

> bootReps <- 500
> seed <- 12345
> set.seed(seed)
> require(boot)
> dat <- rnorm(500)
> bootMean <- function(dat,ind) {
+   mean(dat[ind])
+ }
> start.time <- proc.time()
> bootDat <- boot(dat,bootMean,bootReps)
> boot.ci(bootDat,type="norm")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL : 
boot.ci(boot.out = bootDat, type = "norm")

Intervals : 
Level      Normal        
95%   (-0.0034,  0.1677 )  
Calculations and Intervals on Original Scale
> stop.time <- proc.time()
> elapsed.time1 <- stop.time - start.time
> require(parallel)
> set.seed(seed)
> start.time <- proc.time()
> bootDat <- boot(dat,bootMean,bootReps,
+                 parallel="multicore",
+                 ncpus=2)
> boot.ci(bootDat,type="norm")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL : 
boot.ci(boot.out = bootDat, type = "norm")

Intervals : 
Level      Normal        
95%   (-0.0030,  0.1675 )  
Calculations and Intervals on Original Scale
> stop.time <- proc.time()
> elapsed.time2 <- stop.time - start.time
> elapsed.time1
   user  system elapsed 
  0.028   0.000   0.174 
> elapsed.time2
   user  system elapsed 
  4.336   2.572   0.166 

A very slightly different 95% CI reflecting the way that invoking 
parallel="multicore" changes the seed setting and a huge deterioration in 
execution speed rather than any improvement.

On a more recent four core Toshiba and using ncpus=4 again on Debian Squeeze, 
32bit R, I get exactly the same CIs and this timing:

> elapsed.time1
user system elapsed 
0.032 0.000 0.100 
> elapsed.time2
user system elapsed 
0.032 0.020 0.049 
>

and on a Mac Mini with eight cores on Squeeze but with 64bit R I get the same 
CIs and this timing:

> elapsed.time1
   user  system elapsed 
  0.012   0.004   0.017 
> elapsed.time2
   user  system elapsed 
  0.032   0.012   0.024 

I am clearly missing something, or perhaps something else is choking the work, 
not the CPU power, RAM?  I've tried searching for similar reports on the web 
and was surprised to find nothing using what seemed plausible search strategies.

Anyone able to help me?  I'd desperately like to get a marked speed up for some 
simulation work I'm doing on the Mac mini as it's taking days to run at the 
moment.  The computational intensive bits in the models is a bit more 
complicated than this here (!) but most of the workload will be in the 
bootstrapping and the function I'm bootstrapping for real, although it's a bit 
more complex than a simple mean, isn't that complex though it does involve a 
stratified bootstrap rather than a simple one.  I see very similar marginal 
speed _losses_ invoking more than one core for that work just as with this very 
simple example.

TIA,

Chris

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to