On 13-01-03 7:01 PM, Peter Langfelder wrote:
Hello all,

I am running into a problem with garbage collection not being able to
free up all memory. Unfortunately I am unable to provide a minimal
self-contained example, although I can provide a self contained
example if anyone feels like wading through some 600 lines of code. I
would love to isolate the relevant parts from the code but whenever I
try to run a simpler example, the problem does not appear.

I run an algorithm that repeats the same calculation (on sampled, i.e.
different data) in each iteration. Each iteration uses relatively
large intermediate objects and calculations but returns a smaller
result; these results are then collated and returned from the main
function (call it myFnc). The problem is that memory used by the
intermediate calculations (it is difficult to say whether it's objects
or memory needed for apply calls) does not seem to be freed up even
after doing explicit garbage collection using gc() within the loop.

Thus, a call of something like

result = myFnc(arguments)

results is some memory that does not seem allocated to any visible
objects and yet is not freed up using gc(): After executing an actual
call to the offending function, gc() tells me that Vcells use 538.6
Mb, but the sum of object.size() of all objects listed by ls(all.names
= TRUE) is only 183.3 Mb.


The thing is that if I remove 'result' using rm(result) and do gc()
again, the memory used decreases by a lot.: gc() now reports 110.3 Mb
used in Vcells; this roughly corresponds to the sum of the sizes of
all objects returned by ls() (after removing 'result'), which is now
108.7 Mb. So used memory went down by something like 428 Mb but the
object.size of 'result' is only about 75 Mb.

Thus, it seems that the memory used by internal operations in myFun
that should be freed up upon the completion of the function call
cannot be released by garbage collection until the result of the
function call is also removed.

Like I said, I tried to replicate this behaviour on simple examples
but could not.

My question is, is this behaviour to be expected in complicated code,
or is it a bug that should be reported? Is there any way around it?

Thanks in advance for any insights or pointers.

I doubt if it is a bug.  Remember the warning from ?object.size:

"Exactly which parts of the memory allocation should be attributed to which object is not clear-cut. This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared, for example. (Sharing amongst elements of a character vector is taken into account, but not that between character vectors in a single object.)

The calculation is of the size of the object, and excludes the space needed to store its name in the symbol table.

Associated space (e.g. the environment of a function and what the pointer in a EXTPTRSXP points to) is not included in the calculation."

For a simple example:

> x <- 1:1000000
> object.size(x)
4000024 bytes
> e <- new.env()
> object.size(e)
28 bytes
> e$x <- x
> object.size(e)
28 bytes

At the end, e is an environment holding an object of 4 million bytes, but its size is 28 bytes. You'll get environments whenever you return functions from other functions (e.g. what approxfun() does), or when you create formulas, e.g.

> f <- function() { x <- 1:1000000
+  y <- rnorm(1000000)
+  y ~ x
+ }

> fla <- f()
> object.size(fla)
372 bytes

Now fla is the formula, but the data vectors x and y are part of its environment, so you can use it in fits:

> summary(lm(fla))

Call:
lm(formula = fla)

Residuals:
    Min      1Q  Median      3Q     Max
-4.8357 -0.6748  0.0002  0.6736  4.4961

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.632e-03  1.998e-03  -1.317    0.188
x            3.302e-09  3.461e-09   0.954    0.340

Residual standard error: 0.9992 on 999998 degrees of freedom
Multiple R-squared: 9.098e-07,  Adjusted R-squared: -9.016e-08
F-statistic: 0.9098 on 1 and 999998 DF,  p-value: 0.3402


Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to