Toby,

> On Sep 2, 2015, at 1:12 PM, Toby Hocking <tdho...@gmail.com> wrote:
> 
> Dear R-devel,
> 
> I am running mclapply with many iterations over a function that modifies
> nothing and makes no copies of anything. It is taking up a lot of memory,
> so it seems to me like this is a bug. Should I post this to
> bugs.r-project.org?
> 
> A minimal reproducible example can be obtained by first starting a memory
> monitoring program such as htop, and then executing the following code
> while looking at how much memory is being used by the system
> 
> library(parallel)
> seconds <- 5
> N <- 100000
> result.list <- mclapply(1:N, function(i)Sys.sleep(1/N*seconds))
> 
> On my system, memory usage goes up about 60MB on this example. But it does
> not go up at all if I change mclapply to lapply. Is this a bug?
> 
> For a more detailed discussion with a figure that shows that the memory
> overhead is linear in N, please see
> https://github.com/tdhock/mclapply-memory
> 


I'm not quite sure what is supposed to be the issue here. One would expect the 
memory used will be linear in the number elements you process - by definition 
of the task, since you'll be creating linearly many more objects.

Also using top doesn't actually measure the memory used by R itself (see FAQ 
7.42).

That said, I re-run your script and it didn't look anything like what you have 
on your webpage.  For the NULL result you end up dealing will all the objects 
you create in your test that overshadow any memory usage and stabilizes after 
garbage-collection. As you would expect, any output of top is essentially bogus 
up to a gc. How much memory R will use is essentially governed by the level at 
which you set the gc trigger. In real world you actually want that to be fairly 
high if you can afford it (in gigabytes, not megabytes), because you get often 
much higher performance by delaying gcs if you don't have low total memory 
(essentially using the memory as a buffer). Given that the usage is so 
negligible, it won't trigger any gc on its own, so you're just measuring 
accumulated objects - which will be always higher for mclapply because of the 
bookkeeping and serialization involved in the communication.

The real difference is only in the df case. The reason for it is that your 
lapply() there is simply a no-op, because R is smart enough to realize that you 
are always returning the same object so it won't actually create anything and 
just return a reference back to df - thus using no memory at all. However, once 
you split the inputs, your main session can no longer perform this optimization 
because the processing is now in a separate process, so it has no way of 
knowing that you are returning the object unmodified. So what you are measuring 
is a special case that is arguably not really relevant in real applications.

Cheers,
Simon



>> sessionInfo()
> R version 3.2.2 (2015-08-14)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu precise (12.04.5 LTS)
> 
> locale:
> [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_CA.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_CA.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] parallel  graphics  utils     datasets  stats     grDevices methods
> [8] base
> 
> other attached packages:
> [1] ggplot2_1.0.1      RColorBrewer_1.0-5 lattice_0.20-33
> 
> loaded via a namespace (and not attached):
> [1] Rcpp_0.11.6             digest_0.6.4            MASS_7.3-43
> [4] grid_3.2.2              plyr_1.8.1              gtable_0.1.2
> [7] scales_0.2.3            reshape2_1.2.2          proto_1.0.0
> [10] labeling_0.2            tools_3.2.2             stringr_0.6.2
> [13] dichromat_2.0-0         munsell_0.4.2           PeakSegJoint_2015.08.06
> [16] compiler_3.2.2          colorspace_1.2-4
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to