Hi: (1) lm() drops columns in a rank-deficient model matrix X to make X'X nonsingular - this is called a full-rank reparameterization of the linear model.
(2) How many columns of X are dropped depends on its rank, which in turn depends on the number of constraints in the model matrix. This is related to the degrees of freedom associated with each term in the corresponding linear model.. The default contrasts are > options()$contrasts unordered ordered "contr.treatment" "contr.poly" Other choices include contr.helmert() and contr.sum(). See the help page ?contrasts for further details. See also section 6.2 of Venables and Ripley's _Modern Applied Statistics with S, 4th ed._ for further information on the connection between the columns of the model matrix in ANOVA and the defined sets of contrasts. Under the default contrasts, the first column is dropped for the main effect terms. Here's a simple example of a balanced 2-way ANOVA with interaction: d <- data.frame(a = factor(rep(letters[1:3], each = 4)), b = factor(rep(rep(1:2, each = 2), 3)), c = rnorm(12)) > d a b c 1 a 1 -0.77367688 2 a 1 -0.79069791 3 a 2 0.69257133 4 a 2 2.46788204 5 b 1 0.38892289 6 b 1 -0.03521033 7 b 2 -0.01071611 8 b 2 -0.74209425 9 c 1 1.36974281 10 c 1 -1.22775441 11 c 2 0.29621976 12 c 2 0.28208192 m <- aov(c ~ a * b, data = d) model.matrix(m) (Intercept) ab ac b2 ab:b2 ac:b2 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 1 0 0 4 1 0 0 1 0 0 5 1 1 0 0 0 0 6 1 1 0 0 0 0 7 1 1 0 1 1 0 8 1 1 0 1 1 0 9 1 0 1 0 0 0 10 1 0 1 0 0 0 11 1 0 1 1 0 1 12 1 0 1 1 0 1 attr(,"assign") [1] 0 1 1 2 3 3 attr(,"contrasts") attr(,"contrasts")$a [1] "contr.treatment" attr(,"contrasts")$b [1] "contr.treatment" Notice that the first column of each main effect is dropped, and that the interaction columns are the products of the retained a columns with the retained b columns. The attr() components verify that the contrast method used for this matrix is the default contr.treatment. > anova(m) Analysis of Variance Table Response: c Df Sum Sq Mean Sq F value Pr(>F) a 2 0.5001 0.25003 0.2827 0.7633 b 1 1.3700 1.36999 1.5489 0.2597 a:b 2 4.5647 2.28235 2.5804 0.1554 Residuals 6 5.3070 0.88450 Examination of the degrees of freedom tells us that there are two independent contrasts for a, one independent contrast for b and two independent contrasts for the interaction of a and b, which are shown in model.matrix(m) above. If you want to reset the baselline factor level, see ?relevel. Also look into the contrast package on CRAN. HTH, Dennis On Wed, Jul 21, 2010 at 8:40 AM, Anirban Mukherjee <am...@cornell.edu>wrote: > Hi all, > > If presented with a singular design matrix, lm drops columns to make the > design matrix non-singular. What algorithm is used to select which (and how > many) column(s) to drop? Particularly, given a factor, how does lm choose > levels of the factor to discard? > > Thanks for the help. > > Best, > Anirban > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.