Dear R-devel list,

I realized that removing a predictor in lm through the "-"'s operator in
formula() does not affect the complete cases that are considered. A minimal
example is:

summary(lm(Wind ~ ., data = airquality))
# 42 observations deleted due to missingness

summary(lm(Wind ~ . - Ozone, data = airquality))
# still 42 observations deleted due to missingness, even if only 7 are
# missing for the response and the rest of the predictors

summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
# 7 observations deleted due to missingness

I find this behaviour somehow striking and I was wondering whether it is
intended, or whether it would be appropriate to document it in lm's help.

Any insight on this issue is appreciated.

Best regards,
-- 
Eduardo García Portugués
Assistant professor
Department of Statistics
Carlos III University of Madrid

Office: 7.3.J21 (Leganés)
Phone: (+34) 91624 8836

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to