On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
Dear Duncan

thank you for taking the time to answer my questions! It will be quite
some work to delete all the objects generated inside the function
... but if there is no other way to avoid a large environment then this
is what I will do.

It's not really that hard. Use names <- ls() in the function to get a list of all of them; remove the names of variables that might be needed in the formula (and the name of the formula itself); then use rm(list=names) to delete everything else just before returning it.

Duncan Murdoch


Cheers
Thomas

Duncan Murdoch <murdoch.dun...@gmail.com> writes:

On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
Dear List
I have experienced that objects generated with one of my packages
used a lot of space when saved on disc (object.size did not show
this!).
some debugging revealed that formula and call objects carried the
full environment of subroutines along, including even stuff not
needed by the formula or call. here is a sketch of the problem
,----
| test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
| a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah
| ~/tmp/v.rda")
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`----
I tried to replace line 3 by
,----
| as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
`----
without the desired effect. Instead adding either
,----
| environment(out$f) <- emptyenv() or environment(out$f) <- NULL
`----
has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:
,----
| test <- function(x){ x <- rnorm(1000000) out <- list() out$f <-
| a~b environment(out$f) <- emptyenv() out } d <-
| data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d)
| Error in eval(expr, envir, enclos) : could not find function
| "list"
`----
Same with NULL in place of emptyenv()
Finally using .GlobalEnv in place of emptyenv() seems to remove both
problems.

But it will cause other, less obvious problems.  In a formula, the
symbols mean something.  By setting the environment to .GlobalEnv
you're changing the meaning.  You'll get nonsense in certain cases
when functions look up the meaning of those symbols and find the wrong
thing. (I don't have an example at hand, but I imagine it would be
easy to put one together with update().)

My questions:
1) why does the argument env of as.formula have no effect?

Because the first argument already had an associated environment.  You
passed a ~ b, which is evaluated to a formula; calling as.formula on a
formula does nothing. The env argument is only used when a new formula
needs to be constructed.  (You can see this in the source code;
as.formula is a very simple function.)

2) is there a better way to tell formula not to copy unrelated stuff
into the associated environment?

Yes, delete it.  For example, you could write your function as

  test <- function(x){ x <- rnorm(1000000) out <- list() out$f <- a~b
rm(x) out }

3) why does object.size not show the size of the environments that
formulas can carry along?

Because many objects can share the same environment.  See ?object.size
for more details.

Duncan Murdoch


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to