On 13-01-03 7:01 PM, Peter Langfelder wrote:
Hello all,
I am running into a problem with garbage collection not being able to
free up all memory. Unfortunately I am unable to provide a minimal
self-contained example, although I can provide a self contained
example if anyone feels like wading through some 600 lines of code. I
would love to isolate the relevant parts from the code but whenever I
try to run a simpler example, the problem does not appear.
I run an algorithm that repeats the same calculation (on sampled, i.e.
different data) in each iteration. Each iteration uses relatively
large intermediate objects and calculations but returns a smaller
result; these results are then collated and returned from the main
function (call it myFnc). The problem is that memory used by the
intermediate calculations (it is difficult to say whether it's objects
or memory needed for apply calls) does not seem to be freed up even
after doing explicit garbage collection using gc() within the loop.
Thus, a call of something like
result = myFnc(arguments)
results is some memory that does not seem allocated to any visible
objects and yet is not freed up using gc(): After executing an actual
call to the offending function, gc() tells me that Vcells use 538.6
Mb, but the sum of object.size() of all objects listed by ls(all.names
= TRUE) is only 183.3 Mb.
The thing is that if I remove 'result' using rm(result) and do gc()
again, the memory used decreases by a lot.: gc() now reports 110.3 Mb
used in Vcells; this roughly corresponds to the sum of the sizes of
all objects returned by ls() (after removing 'result'), which is now
108.7 Mb. So used memory went down by something like 428 Mb but the
object.size of 'result' is only about 75 Mb.
Thus, it seems that the memory used by internal operations in myFun
that should be freed up upon the completion of the function call
cannot be released by garbage collection until the result of the
function call is also removed.
Like I said, I tried to replicate this behaviour on simple examples
but could not.
My question is, is this behaviour to be expected in complicated code,
or is it a bug that should be reported? Is there any way around it?
Thanks in advance for any insights or pointers.
I doubt if it is a bug. Remember the warning from ?object.size:
"Exactly which parts of the memory allocation should be attributed to
which object is not clear-cut. This function merely provides a rough
indication: it should be reasonably accurate for atomic vectors, but
does not detect if elements of a list are shared, for example. (Sharing
amongst elements of a character vector is taken into account, but not
that between character vectors in a single object.)
The calculation is of the size of the object, and excludes the space
needed to store its name in the symbol table.
Associated space (e.g. the environment of a function and what the
pointer in a EXTPTRSXP points to) is not included in the calculation."
For a simple example:
> x <- 1:1000000
> object.size(x)
4000024 bytes
> e <- new.env()
> object.size(e)
28 bytes
> e$x <- x
> object.size(e)
28 bytes
At the end, e is an environment holding an object of 4 million bytes,
but its size is 28 bytes. You'll get environments whenever you return
functions from other functions (e.g. what approxfun() does), or when you
create formulas, e.g.
> f <- function() { x <- 1:1000000
+ y <- rnorm(1000000)
+ y ~ x
+ }
> fla <- f()
> object.size(fla)
372 bytes
Now fla is the formula, but the data vectors x and y are part of its
environment, so you can use it in fits:
> summary(lm(fla))
Call:
lm(formula = fla)
Residuals:
Min 1Q Median 3Q Max
-4.8357 -0.6748 0.0002 0.6736 4.4961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.632e-03 1.998e-03 -1.317 0.188
x 3.302e-09 3.461e-09 0.954 0.340
Residual standard error: 0.9992 on 999998 degrees of freedom
Multiple R-squared: 9.098e-07, Adjusted R-squared: -9.016e-08
F-statistic: 0.9098 on 1 and 999998 DF, p-value: 0.3402
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.