It looks in data and if not found there in environment(formula) so try this:
mylm <- function(model, wghts) { lm(model, data.frame(wghts), weights = wghts) } On Sun, Jan 25, 2009 at 4:20 PM, Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: > dear list, > > below is an edited version of my response to an r user asking me for > explaining some issues related to r's evaluation rules. i find the > problem interesting enough to be forwarded to the list, hopefully for > comments from whoever may want to extend or correct my explanations. > > (i'd like to add that much as i'm happy to receive and answer offline > mails, questions related to r are best sent directly to the list, where > the real experts are.) > > > -------- Original Message -------- > Subject: Re: evaluation question > Date: Sun, 25 Jan 2009 20:32:22 +0100 > > > > > > > > > xxx wrote: > > <snip> > >> Someone sent in an example a few days ago showing that prac1 ( see >> below ) doesn't work. Then someone else sent two different >> ways of fixing it. >> I'm still slightly confused. > > <snip> > > >> >> >> x<-1:10; >> y<-rnorm(10) + x; >> >> # THIS DOES NOT WORK >> >> prac1 <- function( model,wghts){ >> lm( model, weights = wghts) >> } >> >> prac1(model = y~x, wghts = rep(1, 10)) > > tfm: > > " the variables are taken from 'environment(formula)', typically > the environment from which 'lm' is called. " > > when lm is applied to a model, the variable names used to pass arguments > to lm (here, 'wghts') are looked up in the environment where the model > was defined. here, you have two environments: > > - the global one (say, e_g), where x, y, and prac1 are defined; > - the call-local one (say, e_p1), created when prac1 is applied. > > there is a variable name 'wghts' in the latter, but none in the > former. just before the call, environmentwise the situation is as follows: > > e_g = { 'x':v1, 'y':v2, 'prac1':v3 } > > where e_g contains three mappings (of those we are interested here), written > here as <name>:<value>, none for > 'wghts'. (the v1, v2, v3 stand for the respective values, as in the > code above.) > > when you apply prac1, you create a new, local environment: > > e_p1 = { 'model':v4, 'wghts':v5 } > > where v4 is a promise with the expression 'y~x' and evaluation > environment e_g (the caller's environment), and v5 is a promise with the > expression 'rep(1, 10)' and evaluation environment e_g. > > when you call lm, things are a little bit more complicated. after some > black magic is performed on the arguments in the lm call, weights are > extracted from the model using model.weights, and the lookup is > performed not in e_p1, but in e_g. > > rm(list=ls()) # cleanup > x = 1:10 > y = rnorm(10)+x > > p1 = function(model, wghts) > lm(model, weights=wghts) > > p1(y~x, rep(1,10)) > # (somewhat cryptic) error: no variable named 'wghts' found > > wghts = rep(1,10) > p1(y~x, wghts) > # now works, e_g has a binding for 'wghts' > # passing wghts as an argument to p1 makes no difference > > note, due to lazy evaluation, the following won't do: > > rm(wghts) # cleanup > > p1(y~x, wghts<-rep(1,10)) > # wghts still not found in e_g > > > if you happen to generalize your p1 over the additional arguments to be > passed to lm, ugly surprizes await, too: > > p2 = function(model, ...) { > # some additional code > lm(model, ...) } > p2(y~x, weights=rep(1,10)) > # (rather cryptic) error > > > if you want to fit a model with different sets of weights, the following > won't do: > > rm(wghts) # cleanup > lapply( > list(rep(1,10), rep(c(0.5, 1.5), 5)), # alternative weight vectors > function(weights) p1(y~x, weights)) > # wghts not found in e_g, as before > > but this, incidentally, will work: > > rm(wghts) # cleanup > lapply( > list(rep(1,10), rep(c(0.5, 1.5), 5)), > function(wghts) p1(y~x, wghts)) > # wghts found in e_g, not in e_p1 > > as will this: > > rm(wghts) # cleanup > lapply( > list(rep(1,10), rep(c(0.5, 1.5), 5)), > function(wghts) p1(y~x)) > # wghts found in e_g > > but obviously not this: > > rm(wghts) # cleanup > lapply( > list(rep(1,10), rep(c(0.5, 1.5), 5)), > function(weights) p1(y~x)) > # wghts not found > > > >> >> # SOLUTION # 1 >> >> prac2 <- function( model,wghts){ >> environment(model) <- environment() >> lm(model,weights = wghts) >> } >> >> prac2(model = y~x, wghts = rep(1, 10)) > > environment() returns the local call environment (see e_p1 above), where > 'wghts' is mapped to a promise to evaluate rep(1,10) in e_g. you set > the environment of model to e_p1, so that lm looks for wghts there -- > and finds it. > > this is an 'elegant' workaround, with possible disastrous > consequences if the model happens to include a variable named 'model' or > 'wghts': > > model = 1:10 > prac2(y~model, rep(1,10)) > # can't use model in a formula? > > wghts = x > prac2(y~wghts, rep(1,10)) > # oops, not quite the same prac2(y~x, rep(1,10)) > > another problem with this 'elegant' 'solution' is that if prac_ happens to > have > local variables with names in conflict with names in the model formula, you're > in trouble again: > > prac2 = function(model, wghts) { > environment(model) = environment() > x = NULL # for whatever reason one might need an x here > # whatever > lm(model, weights = wghts) } > > prac2(y~x, rep(1,10)) > # oops, NULL is not good an x in the model > > these may be unlikely scenarios, but the issue is serious. you need to > understand the details of how lm is implemented in order to understand > why your (intuitively correct) example above did not work, to understand > why and when your variables can be captured in unexpected ways, and to > know how to write 'elegant' 'solutions' to avoid the problems. > > > this also means that writing general purpose modules to be used by others is > likely > to cause errors (or silently produce rubbish) if you're not careful enough. > (or else you need to prepare your code to handle all exotic situations such as > passing weights to prac1 together with a model where one of the variables is > named 'model' or 'wghts' -- this is point 2 in a recent comment by Greg Snow, > which, as he says, stinks hubris. > >> >> # SOLUTION # 2 >> >> prac3 <- function( model,wghts){ >> cur.env <- environment() >> lm( model, weights = wghts, data = cur.env ) >> } >> >> prac3(model = y~x, wghts = rep(1, 10)) >> >> > > this is an equally 'good' 'solution', with the above comments equally > applicable here. you'd have to tell the user of your modules which variable > names are persona non grata. > > vQ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.