On 09/08/2020 3:01 p.m., John Mount wrote:
Doesn't this preclude "y ~ ." style notations?

Yes, but you can use "y ~ . - w".

Duncan Murdoch



On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote:

This is fairly clearly documented in ?lm:

"All of weights, subset and offset are evaluated in the same way as variables in 
formula, that is first in data and then in the environment of formula."

There are lots of possible places to look for weights, but this seems to me 
like a pretty sensible search order.  In most cases the environment of the 
formula will have a parent environment chain that eventually leads to the 
global environment, so (with no conflicts) your strategy of defining w there 
will sometimes work, but looks pretty unreliable.

When you say you want to work around this search order, I think the obvious way 
is to add your w vector to your d dataframe.  That way it is guaranteed to be 
found even if there's a conflicting variable in the formula environment, or the 
global environment.

Duncan Murdoch

On 09/08/2020 2:13 p.m., John Mount wrote:
I know this programmers can reason this out from R's late parameter evaluation 
rules PLUS the explicit match.call()/eval() lm() does to work with the passed 
in formula and data frame. But, from a statistical user point of view this 
seems to be counter-productive. At best it works as if the user is passing in 
the name of the weights variable instead of values (I know this is the obvious 
consequence of NSE).
lm() takes instance weights from the formula environment. Usually that 
environment is the interactive environment or a close child of the interactive 
environment and we are lucky enough to have no intervening name collisions so 
we don't have a problem. However it makes programming over formulas for lm() a 
bit tricky. Here is an example of the issue.
Is there any recommended discussion on this and how to work around it? In my 
own work I explicitly set the formula environment and put the weights in that 
environment.
d <- data.frame(x = 1:3, y = c(3, 3, 4))
w <- c(1, 5, 1)
# works
lm(y ~ x, data = d, weights = w)
# fails, as weights are taken from formul environment
fn <- function() {  # deliberately set up formula with bad value in environment
   w <- c(-1, -1, -1, -1)  # bad weights
   f <- as.formula(y ~ x)  # captures bad weights with as.formula(env = 
parent.frame()) default
   return(f)
}
lm(fn(), data = d, weights = w)
# Error in model.frame.default(formula = fn(), data = d, weights = w, 
drop.unused.levels = TRUE) :
#   variable lengths differ (found for '(weights)')
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to