Dear Doug and Gang Chen, With balanced data and sum-to-zero contrasts, the intercept is indeed the general mean of the response; the coefficient of a1 is the mean of the response in category a1 minus the general mean; the coefficient of a1:b1 is the mean of the response in cell a1, b1 minus the general mean and the coefficients of a1 and b1; etc. For unbalanced data (and balanced data) the intercept is the mean of the cell means; the coefficient of a1 is the mean of cell means at level a1 minus the intercept; etc. Whether all this is of interest is another question, since a simple graph of cell means tells a more digestible story about the data.
Regards, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Douglas Bates > Sent: January-25-09 10:49 AM > To: Gang Chen > Cc: R-help > Subject: Re: [R] Interpreting model matrix columns when using contr.sum > > On Fri, Jan 23, 2009 at 4:58 PM, Gang Chen <gangch...@gmail.com> wrote: > > With the following example using contr.sum for both factors, > > > >> dd <- data.frame(a = gl(3,4), b = gl(4,1,12)) # balanced 2-way > >> model.matrix(~ a * b, dd, contrasts = list(a="contr.sum", b="contr.sum")) > > > > (Intercept) a1 a2 b1 b2 b3 a1:b1 a2:b1 a1:b2 a2:b2 a1:b3 a2:b3 > > 1 1 1 0 1 0 0 1 0 0 0 0 0 > > 2 1 1 0 0 1 0 0 0 1 0 0 0 > > 3 1 1 0 0 0 1 0 0 0 0 1 0 > > 4 1 1 0 -1 -1 -1 -1 0 -1 0 -1 0 > > 5 1 0 1 1 0 0 0 1 0 0 0 0 > > 6 1 0 1 0 1 0 0 0 0 1 0 0 > > 7 1 0 1 0 0 1 0 0 0 0 0 1 > > 8 1 0 1 -1 -1 -1 0 -1 0 -1 0 -1 > > 9 1 -1 -1 1 0 0 -1 -1 0 0 0 0 > > 10 1 -1 -1 0 1 0 0 0 -1 -1 0 0 > > 11 1 -1 -1 0 0 1 0 0 0 0 -1 -1 > > 12 1 -1 -1 -1 -1 -1 1 1 1 1 1 1 > > ... > > > I have two questions: > > > (1) I assume the 1st column (under intercept) is the overall mean, the > > 2rd column (under a1) is the difference between the 1st level of > > factor a and the overall mean, the 4th column (under b1) is the > > difference between the 1st level of factor b and the overall mean. > > > Is this interpretation correct? > > I don't think so and furthermore I don't see why the contrasts should > have an interpretation. The contrasts are simply a parameterization > of the space spanned by the indicator columns of the levels of the > factors. Interpretations as overall means, etc. are mostly a holdover > from antiquated concepts of how analysis of variance tables should be > evalated. > > If you want to determine the interpretation of particular coefficients > for the special case of a balanced design (which doesn't always mean a > resulting balanced data set - I remind my students that expecting a > balanced design to produce balanced data is contrary to Murphy's Law) > the easiest way of doing so is (I think this is right but I can > somehow manage to confuse myself on this with great ease) to calculate > > > contr.sum(3) > [,1] [,2] > 1 1 0 > 2 0 1 > 3 -1 -1 > > solve(cbind(1, contr.sum(3))) > 1 2 3 > [1,] 0.3333333 0.3333333 0.3333333 > [2,] 0.6666667 -0.3333333 -0.3333333 > [3,] -0.3333333 0.6666667 -0.3333333 > > solve(cbind(1, contr.sum(4))) > 1 2 3 4 > [1,] 0.25 0.25 0.25 0.25 > [2,] 0.75 -0.25 -0.25 -0.25 > [3,] -0.25 0.75 -0.25 -0.25 > [4,] -0.25 -0.25 0.75 -0.25 > > That is, the first coefficient is the "overall mean" (but only for a > balanced data set), the second is a contrast of the first level with > the others, the third is a contrast of the second level with the > others and so on. > > > (2) I'm not so sure about those interaction columns. For example, what > > is a1:b1? Is it the 1st level of factor a at the 1st level of factor b > > versus the overall mean, or something more complicated? > > Well, at the risk of sounding trivial, a1:b1 is the product of the a1 > and b1 columns. You need a basis for a certain subspace and this > provides one. I don't see why there must be interpretations of the > coefficients. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.