[Rd] Apparent bug in behavior of formulas with '-' operator for lm

Mark van der Loo Fri, 16 Mar 2018 02:22:30 -0700

Dear R-developers,

In the 'lm' documentation, the '-' operator is only specified to be used
with -1 (to remove the intercept from the model).


However, the documentation also refers to the 'formula' help file, which
indicates that it is possible to subtract any term. Indeed, the following
works with no problems (the period '.' stands for 'all terms except the
lhs'):

d <- data.frame(x=rnorm(6), y=rnorm(6), z=letters[1:2])
m <- lm(x ~ . -z, data=d)
p <- predict(m,newdata=d)

Now, if I change 'z' so that it has only unique values, and I introduce an
NA in the predicted variable, the following happens:

d <- data.frame(x=rnorm(6),y=rnorm(6),z=letters[1:6])
d$x[1] <- NA
m <- lm(x ~ . -z, data=d)
p <- predict(m, newdata=d)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) : factor z has new levels a

It seems a bug to me, although one could argue that 'lm's documentation
does not allow one to expect that the '-' operator should work generally.

If it is a bug I'm happy to report it to bugzilla.

Thanks for all your efforts,
Mark

ps: I was not able to test this on R3.4.4 yet, but the NEWS does not
mention fixes related to 'lm' or 'predict'.


> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
LC_PAPER=nl_NL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
 LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.1.16

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Apparent bug in behavior of formulas with '-' operator for lm

Reply via email to