Thanks, Joris,
This clarifies at least where exactly it comes from. I still find the
high-level behavior of 'predict' very counter-intuitive as the estimated
model contains no coefficients in 'z', but I think we agree on that.
I am not sure how much trouble it would be to improve this behavior, b
Technically it is used as a predictor in the model. The information is
contained in terms :
> terms(x ~ . - z, data = d)
x ~ (y + z) - z
attr(,"variables")
list(x, y, z)
attr(,"factors")
y
x 0
y 1
z 0
attr(,"term.labels")
[1] "y"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1
Joris, the point is that 'z' is NOT used as a predictor in the model.
Therefore it should not affect predictions. Also, I find it suspicious that
the error only occurs when the response variable conitains missings and 'z'
is unique (I have tested several other cases to confirm this).
-Mark
Op vr
It's not a bug per se. It's the effect of removing all observations linked
to a certain level in your data frame. So the output of lm() doesn't
contain a coefficient for level a of z, but your new data contains that
level a. With a small addition, this works again:
d <- data.frame(x=rnorm(12),y=rn