Dear R-developers, In the 'lm' documentation, the '-' operator is only specified to be used with -1 (to remove the intercept from the model).
However, the documentation also refers to the 'formula' help file, which indicates that it is possible to subtract any term. Indeed, the following works with no problems (the period '.' stands for 'all terms except the lhs'): d <- data.frame(x=rnorm(6), y=rnorm(6), z=letters[1:2]) m <- lm(x ~ . -z, data=d) p <- predict(m,newdata=d) Now, if I change 'z' so that it has only unique values, and I introduce an NA in the predicted variable, the following happens: d <- data.frame(x=rnorm(6),y=rnorm(6),z=letters[1:6]) d$x[1] <- NA m <- lm(x ~ . -z, data=d) p <- predict(m, newdata=d) Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor z has new levels a It seems a bug to me, although one could argue that 'lm's documentation does not allow one to expect that the '-' operator should work generally. If it is a bug I'm happy to report it to bugzilla. Thanks for all your efforts, Mark ps: I was not able to test this on R3.4.4 yet, but the NEWS does not mention fixes related to 'lm' or 'predict'. > sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.4 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 yaml_2.1.16 [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel