Doesn't this preclude "y ~ ." style notations? > On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > > This is fairly clearly documented in ?lm: > > "All of weights, subset and offset are evaluated in the same way as variables > in formula, that is first in data and then in the environment of formula." > > There are lots of possible places to look for weights, but this seems to me > like a pretty sensible search order. In most cases the environment of the > formula will have a parent environment chain that eventually leads to the > global environment, so (with no conflicts) your strategy of defining w there > will sometimes work, but looks pretty unreliable. > > When you say you want to work around this search order, I think the obvious > way is to add your w vector to your d dataframe. That way it is guaranteed > to be found even if there's a conflicting variable in the formula > environment, or the global environment. > > Duncan Murdoch > > On 09/08/2020 2:13 p.m., John Mount wrote: >> I know this programmers can reason this out from R's late parameter >> evaluation rules PLUS the explicit match.call()/eval() lm() does to work >> with the passed in formula and data frame. But, from a statistical user >> point of view this seems to be counter-productive. At best it works as if >> the user is passing in the name of the weights variable instead of values (I >> know this is the obvious consequence of NSE). >> lm() takes instance weights from the formula environment. Usually that >> environment is the interactive environment or a close child of the >> interactive environment and we are lucky enough to have no intervening name >> collisions so we don't have a problem. However it makes programming over >> formulas for lm() a bit tricky. Here is an example of the issue. >> Is there any recommended discussion on this and how to work around it? In my >> own work I explicitly set the formula environment and put the weights in >> that environment. >> d <- data.frame(x = 1:3, y = c(3, 3, 4)) >> w <- c(1, 5, 1) >> # works >> lm(y ~ x, data = d, weights = w) >> # fails, as weights are taken from formul environment >> fn <- function() { # deliberately set up formula with bad value in >> environment >> w <- c(-1, -1, -1, -1) # bad weights >> f <- as.formula(y ~ x) # captures bad weights with as.formula(env = >> parent.frame()) default >> return(f) >> } >> lm(fn(), data = d, weights = w) >> # Error in model.frame.default(formula = fn(), data = d, weights = w, >> drop.unused.levels = TRUE) : >> # variable lengths differ (found for '(weights)') >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel