dear list, below is an edited version of my response to an r user asking me for explaining some issues related to r's evaluation rules. i find the problem interesting enough to be forwarded to the list, hopefully for comments from whoever may want to extend or correct my explanations.
(i'd like to add that much as i'm happy to receive and answer offline mails, questions related to r are best sent directly to the list, where the real experts are.) -------- Original Message -------- Subject: Re: evaluation question Date: Sun, 25 Jan 2009 20:32:22 +0100 xxx wrote: <snip> > Someone sent in an example a few days ago showing that prac1 ( see > below ) doesn't work. Then someone else sent two different > ways of fixing it. > I'm still slightly confused. <snip> > > > x<-1:10; > y<-rnorm(10) + x; > > # THIS DOES NOT WORK > > prac1 <- function( model,wghts){ > lm( model, weights = wghts) > } > > prac1(model = y~x, wghts = rep(1, 10)) tfm: " the variables are taken from 'environment(formula)', typically the environment from which 'lm' is called. " when lm is applied to a model, the variable names used to pass arguments to lm (here, 'wghts') are looked up in the environment where the model was defined. here, you have two environments: - the global one (say, e_g), where x, y, and prac1 are defined; - the call-local one (say, e_p1), created when prac1 is applied. there is a variable name 'wghts' in the latter, but none in the former. just before the call, environmentwise the situation is as follows: e_g = { 'x':v1, 'y':v2, 'prac1':v3 } where e_g contains three mappings (of those we are interested here), written here as <name>:<value>, none for 'wghts'. (the v1, v2, v3 stand for the respective values, as in the code above.) when you apply prac1, you create a new, local environment: e_p1 = { 'model':v4, 'wghts':v5 } where v4 is a promise with the expression 'y~x' and evaluation environment e_g (the caller's environment), and v5 is a promise with the expression 'rep(1, 10)' and evaluation environment e_g. when you call lm, things are a little bit more complicated. after some black magic is performed on the arguments in the lm call, weights are extracted from the model using model.weights, and the lookup is performed not in e_p1, but in e_g. rm(list=ls()) # cleanup x = 1:10 y = rnorm(10)+x p1 = function(model, wghts) lm(model, weights=wghts) p1(y~x, rep(1,10)) # (somewhat cryptic) error: no variable named 'wghts' found wghts = rep(1,10) p1(y~x, wghts) # now works, e_g has a binding for 'wghts' # passing wghts as an argument to p1 makes no difference note, due to lazy evaluation, the following won't do: rm(wghts) # cleanup p1(y~x, wghts<-rep(1,10)) # wghts still not found in e_g if you happen to generalize your p1 over the additional arguments to be passed to lm, ugly surprizes await, too: p2 = function(model, ...) { # some additional code lm(model, ...) } p2(y~x, weights=rep(1,10)) # (rather cryptic) error if you want to fit a model with different sets of weights, the following won't do: rm(wghts) # cleanup lapply( list(rep(1,10), rep(c(0.5, 1.5), 5)), # alternative weight vectors function(weights) p1(y~x, weights)) # wghts not found in e_g, as before but this, incidentally, will work: rm(wghts) # cleanup lapply( list(rep(1,10), rep(c(0.5, 1.5), 5)), function(wghts) p1(y~x, wghts)) # wghts found in e_g, not in e_p1 as will this: rm(wghts) # cleanup lapply( list(rep(1,10), rep(c(0.5, 1.5), 5)), function(wghts) p1(y~x)) # wghts found in e_g but obviously not this: rm(wghts) # cleanup lapply( list(rep(1,10), rep(c(0.5, 1.5), 5)), function(weights) p1(y~x)) # wghts not found > > # SOLUTION # 1 > > prac2 <- function( model,wghts){ > environment(model) <- environment() > lm(model,weights = wghts) > } > > prac2(model = y~x, wghts = rep(1, 10)) environment() returns the local call environment (see e_p1 above), where 'wghts' is mapped to a promise to evaluate rep(1,10) in e_g. you set the environment of model to e_p1, so that lm looks for wghts there -- and finds it. this is an 'elegant' workaround, with possible disastrous consequences if the model happens to include a variable named 'model' or 'wghts': model = 1:10 prac2(y~model, rep(1,10)) # can't use model in a formula? wghts = x prac2(y~wghts, rep(1,10)) # oops, not quite the same prac2(y~x, rep(1,10)) another problem with this 'elegant' 'solution' is that if prac_ happens to have local variables with names in conflict with names in the model formula, you're in trouble again: prac2 = function(model, wghts) { environment(model) = environment() x = NULL # for whatever reason one might need an x here # whatever lm(model, weights = wghts) } prac2(y~x, rep(1,10)) # oops, NULL is not good an x in the model these may be unlikely scenarios, but the issue is serious. you need to understand the details of how lm is implemented in order to understand why your (intuitively correct) example above did not work, to understand why and when your variables can be captured in unexpected ways, and to know how to write 'elegant' 'solutions' to avoid the problems. this also means that writing general purpose modules to be used by others is likely to cause errors (or silently produce rubbish) if you're not careful enough. (or else you need to prepare your code to handle all exotic situations such as passing weights to prac1 together with a model where one of the variables is named 'model' or 'wghts' -- this is point 2 in a recent comment by Greg Snow, which, as he says, stinks hubris. > > # SOLUTION # 2 > > prac3 <- function( model,wghts){ > cur.env <- environment() > lm( model, weights = wghts, data = cur.env ) > } > > prac3(model = y~x, wghts = rep(1, 10)) > > this is an equally 'good' 'solution', with the above comments equally applicable here. you'd have to tell the user of your modules which variable names are persona non grata. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.