Re: [R] No speed up using the parallel package and ncpus > 1 with boot() on linux machines

Jeff Newmiller Sat, 17 Oct 2015 10:30:48 -0700

None of this is surprising. If the calculations you divide your work upinto are small, then the overhead of communicating between parallelprocesses will be a relatively large penalty to pay. You have to breakyour problem up into larger chunks and depend on vector processing withinprocesses to keep the cpu busy doing useful work.

Also, I am not aware of any model of Mac Mini that has 8 physical cores...4 is the max. Virtual cores gain a logical simplification ofmultiprocessing but do not offer actual improved performance becausethere are only as many physical data paths and registers as there arecores.

Note that your problems are with long-running simulations... your examplesare too small to demonstrate the actual balance of processing vs.communication overhead. Before you draw conclusions, try upping bootRepsby a few orders of magnitude, and run your test code a coupleof times to stabilize the memory conditions and obtain some consistencyin timings.

I have never used the parallel option in the boot package before... I havealways rolled my own to allow me to decide how much work to do within theworker processes before returning from them. (This is particularly severewhen using snow, but not necessarily something you can neglect withmulticore.)


On Sat, 17 Oct 2015, Chris Evans wrote:

I think I am failing to understand how boot() uses the parallel package on linux machines, using R 
3.2.2 on three different machines with 2, 4 and 8 cores all results in a slow down if I use 
"multicore" and "ncpus".  Here's the code that creates a very simple 
reproducible example:

bootReps <- 500
seed <- 12345
set.seed(seed)
require(boot)
dat <- rnorm(500)
bootMean <- function(dat,ind) {
 mean(dat[ind])
}
start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps)
boot.ci(bootDat,type="norm")
stop.time <- proc.time()
elapsed.time1 <- stop.time - start.time
require(parallel)
set.seed(seed)
start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps,
               parallel="multicore",
               ncpus=2)
boot.ci(bootDat,type="norm")
stop.time <- proc.time()
elapsed.time2 <- stop.time - start.time
elapsed.time1
elapsed.time2

Running that on my old Dell Latitude E6500 running Debian Squeeze andusing 32 bit R 3.2.2 gives me:

bootReps <- 500
seed <- 12345
set.seed(seed)
require(boot)
dat <- rnorm(500)
bootMean <- function(dat,ind) {

+   mean(dat[ind])
+ }

start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps)
boot.ci(bootDat,type="norm")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL :
boot.ci(boot.out = bootDat, type = "norm")

Intervals :
Level      Normal
95%   (-0.0034,  0.1677 )
Calculations and Intervals on Original Scale

stop.time <- proc.time()
elapsed.time1 <- stop.time - start.time
require(parallel)
set.seed(seed)
start.time <- proc.time()
bootDat <- boot(dat,bootMean,bootReps,

+                 parallel="multicore",
+                 ncpus=2)

boot.ci(bootDat,type="norm")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL :
boot.ci(boot.out = bootDat, type = "norm")

Intervals :
Level      Normal
95%   (-0.0030,  0.1675 )
Calculations and Intervals on Original Scale

stop.time <- proc.time()
elapsed.time2 <- stop.time - start.time
elapsed.time1

  user  system elapsed
 0.028   0.000   0.174

elapsed.time2

  user  system elapsed
 4.336   2.572   0.166

A very slightly different 95% CI reflecting the way that invokingparallel="multicore" changes the seed setting and a huge deteriorationin execution speed rather than any improvement.

On a more recent four core Toshiba and using ncpus=4 again on DebianSqueeze, 32bit R, I get exactly the same CIs and this timing:

elapsed.time1

user system elapsed
0.032 0.000 0.100

elapsed.time2

user system elapsed
0.032 0.020 0.049

and on a Mac Mini with eight cores on Squeeze but with 64bit R I get thesame CIs and this timing:

elapsed.time1

  user  system elapsed
 0.012   0.004   0.017

elapsed.time2

  user  system elapsed
 0.032   0.012   0.024

I am clearly missing something, or perhaps something else is choking the work, 
not the CPU power, RAM?  I've tried searching for similar reports on the web 
and was surprised to find nothing using what seemed plausible search strategies.

Anyone able to help me?  I'd desperately like to get a marked speed up for some 
simulation work I'm doing on the Mac mini as it's taking days to run at the 
moment.  The computational intensive bits in the models is a bit more 
complicated than this here (!) but most of the workload will be in the 
bootstrapping and the function I'm bootstrapping for real, although it's a bit 
more complex than a simple mean, isn't that complex though it does involve a 
stratified bootstrap rather than a simple one.  I see very similar marginal 
speed _losses_ invoking more than one core for that work just as with this very 
simple example.

TIA,

Chris

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] No speed up using the parallel package and ncpus > 1 with boot() on linux machines

Reply via email to