Dear R-Devel,
I seem to no longer be able to access the bug-reporting system, so am doing
this by e-mail.
My report concerns models where variables are explicitly referenced (or is it
"dereferenced"?), such as:
cars.lm <- lm(mtcars[[1]] ~ factor(mtcars$cyl) + mtcars[["disp"]])
I have found that it is not possible to predict such models with new data. For
example:
> predict(cars.lm, newdata = mtcars[1:5, )
1 2 3 4 5 6 7 8
9 10
20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859
25.68971 20.17199
11 12 13 14 15 16 17 18
19 20
20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558
27.46750 27.59312
21 22 23 24 25 26 27 28
29 30
26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772
15.15735 20.78917
31 32
16.52278 26.23042
Warning message:
'newdata' had 5 rows but variables found have 32 rows
Instead of returning 5 predictions, it returns the 32 original predicted
values. There is a warning message suggesting that something went wrong. This
tickled my curiosity, and hance this result:
> predict(cars.lm, newdata = data.frame(x = 1:32))
1 2 3 4 5 6 7 8
9 10
20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859
25.68971 20.17199
11 12 13 14 15 16 17 18
19 20
20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558
27.46750 27.59312
21 22 23 24 25 26 27 28
29 30
26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772
15.15735 20.78917
31 32
16.52278 26.23042
Again, the new data are ignored, but there is no warning message, because the
previous warning was based only on a discrepancy with the number of rows and
the number of predictions. Indeed, the new data set makes no sense at all in
the context of this model.
At the root of this behavior is the fact that the model.frame function ignores
its data argument with such models. So instead of constructing a new frame
based on the new data, it just returns the original model frame.
I am not really suggesting that you try to make these things work with models
when the formula is like this. Instead, I am hoping that it throws an actual
error message rather than just a warning, and that you be a little bit more
sophisticated than merely checking the number of rows. Both predict() with
newdata provided, and model.frame() with a data argument, should return an
informative error message that says that model formulas like this are not
supported with new data. Here is what appears to be an easy way to check:
> get_all_vars(terms(cars.lm))
Error in eval(inp, data, env) : object 'cyl' not found
Thanks
Russ
Russell V. Lenth - Professor Emeritus
Department of Statistics and Actuarial Science
The University of Iowa - Iowa City, IA 52242 USA
Voice (319)335-0712 (Dept. office) - FAX (319)335-3017
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel