Re: [Rd] Apparent bug in behavior of formulas with '-' operator for lm

2018-03-16 Thread Mark van der Loo
Thanks, Joris, This clarifies at least where exactly it comes from. I still find the high-level behavior of 'predict' very counter-intuitive as the estimated model contains no coefficients in 'z', but I think we agree on that. I am not sure how much trouble it would be to improve this behavior, b

Re: [Rd] Apparent bug in behavior of formulas with '-' operator for lm

2018-03-16 Thread Joris Meys
Technically it is used as a predictor in the model. The information is contained in terms : > terms(x ~ . - z, data = d) x ~ (y + z) - z attr(,"variables") list(x, y, z) attr(,"factors") y x 0 y 1 z 0 attr(,"term.labels") [1] "y" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1

Re: [Rd] Apparent bug in behavior of formulas with '-' operator for lm

2018-03-16 Thread Mark van der Loo
Joris, the point is that 'z' is NOT used as a predictor in the model. Therefore it should not affect predictions. Also, I find it suspicious that the error only occurs when the response variable conitains missings and 'z' is unique (I have tested several other cases to confirm this). -Mark Op vr

Re: [Rd] Apparent bug in behavior of formulas with '-' operator for lm

2018-03-16 Thread Joris Meys
It's not a bug per se. It's the effect of removing all observations linked to a certain level in your data frame. So the output of lm() doesn't contain a coefficient for level a of z, but your new data contains that level a. With a small addition, this works again: d <- data.frame(x=rnorm(12),y=rn