I know this programmers can reason this out from R's late parameter evaluation
rules PLUS the explicit match.call()/eval() lm() does to work with the passed
in formula and data frame. But, from a statistical user point of view this
seems to be counter-productive. At best it works as if the user is passing in
the name of the weights variable instead of values (I know this is the obvious
consequence of NSE).
lm() takes instance weights from the formula environment. Usually that
environment is the interactive environment or a close child of the interactive
environment and we are lucky enough to have no intervening name collisions so
we don't have a problem. However it makes programming over formulas for lm() a
bit tricky. Here is an example of the issue.
Is there any recommended discussion on this and how to work around it? In my
own work I explicitly set the formula environment and put the weights in that
environment.
d <- data.frame(x = 1:3, y = c(3, 3, 4))
w <- c(1, 5, 1)
# works
lm(y ~ x, data = d, weights = w)
# fails, as weights are taken from formul environment
fn <- function() { # deliberately set up formula with bad value in environment
w <- c(-1, -1, -1, -1) # bad weights
f <- as.formula(y ~ x) # captures bad weights with as.formula(env =
parent.frame()) default
return(f)
}
lm(fn(), data = d, weights = w)
# Error in model.frame.default(formula = fn(), data = d, weights = w,
drop.unused.levels = TRUE) :
# variable lengths differ (found for '(weights)')
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel