Thanks for your reply, Duncan - you hit the nail on the head (as usual, the problem turned out to sit between the keyboard and the chair :)). My function does return regression models that contain the input formulae together with the associated (big) environment.
Peter On Thu, Jan 3, 2013 at 4:41 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 13-01-03 7:01 PM, Peter Langfelder wrote: >> >> Hello all, >> >> I am running into a problem with garbage collection not being able to >> free up all memory. Unfortunately I am unable to provide a minimal >> self-contained example, although I can provide a self contained >> example if anyone feels like wading through some 600 lines of code. I >> would love to isolate the relevant parts from the code but whenever I >> try to run a simpler example, the problem does not appear. >> >> I run an algorithm that repeats the same calculation (on sampled, i.e. >> different data) in each iteration. Each iteration uses relatively >> large intermediate objects and calculations but returns a smaller >> result; these results are then collated and returned from the main >> function (call it myFnc). The problem is that memory used by the >> intermediate calculations (it is difficult to say whether it's objects >> or memory needed for apply calls) does not seem to be freed up even >> after doing explicit garbage collection using gc() within the loop. >> >> Thus, a call of something like >> >> result = myFnc(arguments) >> >> results is some memory that does not seem allocated to any visible >> objects and yet is not freed up using gc(): After executing an actual >> call to the offending function, gc() tells me that Vcells use 538.6 >> Mb, but the sum of object.size() of all objects listed by ls(all.names >> = TRUE) is only 183.3 Mb. >> >> >> The thing is that if I remove 'result' using rm(result) and do gc() >> again, the memory used decreases by a lot.: gc() now reports 110.3 Mb >> used in Vcells; this roughly corresponds to the sum of the sizes of >> all objects returned by ls() (after removing 'result'), which is now >> 108.7 Mb. So used memory went down by something like 428 Mb but the >> object.size of 'result' is only about 75 Mb. >> >> Thus, it seems that the memory used by internal operations in myFun >> that should be freed up upon the completion of the function call >> cannot be released by garbage collection until the result of the >> function call is also removed. >> >> Like I said, I tried to replicate this behaviour on simple examples >> but could not. >> >> My question is, is this behaviour to be expected in complicated code, >> or is it a bug that should be reported? Is there any way around it? >> >> Thanks in advance for any insights or pointers. > > > I doubt if it is a bug. Remember the warning from ?object.size: > > "Exactly which parts of the memory allocation should be attributed to which > object is not clear-cut. This function merely provides a rough indication: > it should be reasonably accurate for atomic vectors, but does not detect if > elements of a list are shared, for example. (Sharing amongst elements of a > character vector is taken into account, but not that between character > vectors in a single object.) If I understand correctly, sharing would inflate the sum of object.size()'s relative to the values returned by gc(), correct? The opposite is happening in my case. > > The calculation is of the size of the object, and excludes the space needed > to store its name in the symbol table. > > Associated space (e.g. the environment of a function and what the pointer in > a EXTPTRSXP points to) is not included in the calculation." > > For a simple example: > >> x <- 1:1000000 >> object.size(x) > 4000024 bytes >> e <- new.env() >> object.size(e) > 28 bytes >> e$x <- x >> object.size(e) > 28 bytes > > At the end, e is an environment holding an object of 4 million bytes, but > its size is 28 bytes. You'll get environments whenever you return functions > from other functions (e.g. what approxfun() does), or when you create > formulas, e.g. > >> f <- function() { x <- 1:1000000 > + y <- rnorm(1000000) > + y ~ x > + } >/ >> fla <- f() >> object.size(fla) > 372 bytes > > Now fla is the formula, but the data vectors x and y are part of its > environment ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.