Dear Tal, A complete explanation of this issue is too long for an email; I do address it in my Applied Regression Analysis and Generalized Linear Models text. The question seems to come up so frequently that Georges Monette and I are writing a paper on it (and related issues) for the upcoming useR conference.
Briefly, any set of "contrasts" that are orthogonal in the row basis of the model matrix (essentially, composed from what you see when you use the contrasts() function in R) will produce the same sums of squares (or, in the multivariate case, sums of squares and products). This include Hermert (contr.helmert), sigma-constrained (contr.sum), and orthogonal-polynomial (contr.poly) contrasts, but not dummy-coded "contrasts" (contr.treatment). (Actually, if you look carefully, you'll see that the contrasts defined for treatment in the OBrienKaiser data are custom orthogonal contrasts similar to Helmert contrasts.) Consequently, if all you're concerned with is the ANOVA table, it doesn't matter which of these you use. If, however, you're interested in the individual contrasts, it does of course matter which you use, and in particular the orthogonal polynomial contrasts are not sensible if the levels of the factor aren't ordered. Regards, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox > -----Original Message----- > From: Tal Galili [mailto:tal.gal...@gmail.com] > Sent: February-15-09 5:16 AM > To: John Fox > Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly > Subject: Re: [R] Anova and unbalanced designs > > Dear John - thank you for your detailed answer and help. > > Your answer encourages me to ask further: by choosing different contrasts, > what are the different hypothesis which are being tested? (or put differently > - should I prefer contr.sum over contr.poly or contr.helmert, or does this > makes no difference ?) > How should this question be approached/answered ? > > I see in the ?contrasts in R that the referenced reading is: > "Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of > Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & > Brooks/Cole." > Yet I must admit I don't have this book readily available (not on the web, > nor in my local library), so other recommended sources would be of great > help. > > > For future reference I add here a some tinkering of the code to show how > implementing different contrasts will resort in different SS type III > analysis results: > > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)), > levels=c("pretest", "posttest", "followup")) > hour <- ordered(rep(1:5, 3)) > idata <- data.frame(phase, hour) > > contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment") > mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > post.1, post.2, post.3, post.4, post.5, > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > contrasted.treatment*gender, data=OBrienKaiser) > contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert") > mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > post.1, post.2, post.3, post.4, post.5, > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > contrasted.treatment*gender, data=OBrienKaiser) > contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly") > mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > post.1, post.2, post.3, post.4, post.5, > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > contrasted.treatment*gender, data=OBrienKaiser) > contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum") > mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > post.1, post.2, post.3, post.4, post.5, > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > contrasted.treatment*gender, data=OBrienKaiser) > > # this is one result: > (Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type = > "III")) > # all of the other contrasts will now give the same outcome: (does that mean > there shouldn't be a preference of using one over the other ?) > (Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type = "III")) > (Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = "III")) > (Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = "III")) > > > > > With regards, > Tal > > > > > On Sat, Feb 14, 2009 at 7:09 PM, John Fox <j...@mcmaster.ca> wrote: > > > Dear Tal, > > > > -----Original Message----- > > From: Tal Galili [mailto:tal.gal...@gmail.com] > > Sent: February-14-09 10:23 AM > > To: John Fox > > > Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael > Friendly > > Subject: Re: [R] Anova and unbalanced designs > > > > > Hello John and other R mailing list members. > > > > I've been following your discussions regarding the Anova command for > the > SS > > type 2/3 repeated measures Anova, and I have a question: > > > > I found that when I go from using type II to using type III, the > summary > > model is suddenly added with an "intercept" term (example in the end > of > the > > e-mail). So my question is > > 1) why is this "intercept" term added (in SS type "III" vs the type > "II")? > > > The computational approach taken in Anova() makes it simpler to include > the > intercept in the "type-III" tests and not to include it in the "type- > II" > tests. > > > > 2) Can/should this "intercept" term be removed ? (or how should it be > > interpreted ?) > > > The test for the intercept is rarely of interest. A "type-II" test for > the > intercept would test that the unconditional mean of the response is 0; > a > "type-III" test for the intercept would test that the constant term in > the > full model fit to the data is 0. The latter depends upon the > parametrization > of the model (in the case of an ANOVA model, what kind of "contrasts" > are > used). You state that the example that you give is taken from ?Anova > but > there's a crucial detail that's omitted: The help file only gives the > "type-II" tests; the "type-III" tests are also reasonable here, but > they > depend upon having used "contr.sum" (or another set of contrasts that's > orthogonal in the row basis of the model matrix) for the between- > subject > factors, treatment and gender. This detail is in the data set: > > > OBrienKaiser$gender > [1] M M M F F M M F F M M M F F F F > attr(,"contrasts") > [1] contr.sum > Levels: F M > > > OBrienKaiser$treatment > [1] control control control control control A A A A > B B > [12] B B B B B > attr(,"contrasts") > [,1] [,2] > control -2 0 > A 1 -1 > B 1 1 > Levels: control A B > > With proper contrast coding, the "type-III" test for the intercept > tests > that the mean of the cell means (the "grand mean") is 0. > > Had the default dummy-coded contrasts (from contr.treatment) been used, > the > tests would not have tested reasonable hypotheses. My advice, from the > help > file: "Be very careful in formulating the model for type-III tests, or > the > > hypotheses tested will not make sense." > > > I hope this helps, > John > > > > > > My purpose is to be able to use the Anova for analyzing an experiment > with > a > > 2 between and 3 within factors, where the between factors are not > balanced, > > and the within factors are (that is why I can't use the aov command). > > > > > > #---code start > > > > #---code start > > > > #---code start > > > > # (taken from the ?Anova help file) > > > > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, > 5)), > > levels=c("pretest", "posttest", "followup")) > > hour <- ordered(rep(1:5, 3)) > > idata <- data.frame(phase, hour) > > idata > > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > > post.1, post.2, post.3, post.4, post.5, > > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > treatment*gender, > > data=OBrienKaiser) > > > > # now we have two options > > # option one is to use type II: > > > > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = > "II")) > > > > > > #output: > > Type II Repeated Measures MANOVA Tests: Pillai test statistic > > Df test stat approx F num Df den Df > Pr(>F) > > treatment 2 0.4809 4.6323 2 10 > 0.0376868 > * > > gender 1 0.2036 2.5558 1 10 > 0.1409735 > > treatment:gender 2 0.3635 2.8555 2 10 > 0.1044692 > > phase 1 0.8505 25.6053 2 9 > 0.0001930 > *** > > treatment:phase 2 0.6852 2.6056 4 20 > 0.0667354 > . > > gender:phase 1 0.0431 0.2029 2 9 > 0.8199968 > > treatment:gender:phase 2 0.3106 0.9193 4 20 > 0.4721498 > > hour 1 0.9347 25.0401 4 7 > 0.0003043 > *** > > treatment:hour 2 0.3014 0.3549 8 16 > 0.9295212 > > gender:hour 1 0.2927 0.7243 4 7 > 0.6023742 > > treatment:gender:hour 2 0.5702 0.7976 8 16 > 0.6131884 > > phase:hour 1 0.5496 0.4576 8 3 > 0.8324517 > > treatment:phase:hour 2 0.6637 0.2483 16 8 > 0.9914415 > > gender:phase:hour 1 0.6950 0.8547 8 3 > 0.6202076 > > treatment:gender:phase:hour 2 0.7928 0.3283 16 8 > 0.9723693 > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > # option two is to use type III, and then get an added intercept > term: > > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = > "III")) > > > > > > # here is the output: > > Type III Repeated Measures MANOVA Tests: Pillai test statistic > > Df test stat approx F num Df den Df > Pr(>F) > > (Intercept) 1 0.967 296.389 1 10 > 9.241e-09 > *** > > treatment 2 0.441 3.940 2 10 > 0.0547069 > . > > gender 1 0.268 3.659 1 10 > 0.0848003 > . > > treatment:gender 2 0.364 2.855 2 10 > 0.1044692 > > phase 1 0.814 19.645 2 9 > 0.0005208 > *** > > treatment:phase 2 0.696 2.670 4 20 > 0.0621085 > . > > gender:phase 1 0.066 0.319 2 9 > 0.7349696 > > treatment:gender:phase 2 0.311 0.919 4 20 > 0.4721498 > > hour 1 0.933 24.315 4 7 > 0.0003345 > *** > > treatment:hour 2 0.316 0.376 8 16 > 0.9183275 > > gender:hour 1 0.339 0.898 4 7 > 0.5129764 > > treatment:gender:hour 2 0.570 0.798 8 16 > 0.6131884 > > phase:hour 1 0.560 0.478 8 3 > 0.8202673 > > treatment:phase:hour 2 0.662 0.248 16 8 > 0.9915531 > > gender:phase:hour 1 0.712 0.925 8 3 > 0.5894907 > > treatment:gender:phase:hour 2 0.793 0.328 16 8 > 0.9723693 > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > > > #---code end > > > > #---code end > > > > #---code end > > > > > > > > Thanks in advance for your help! > > Tal Galili > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jan 25, 2009 at 3:08 AM, John Fox <j...@mcmaster.ca> wrote: > > > > > > Dear Peter and Nils, > > > > In my initial message, I stated misleadingly that the contrast > coding > > didn't > > matter for the "type-III" tests here since there is just one > > between-subjects factor, but that's not right: The between > type-III > SS > > is > > correct using contr.treatment(), but the within SS is not. As > is > > generally > > the case, to get reasonable type-III tests (i.e., tests of > reasonable > > hypotheses), it's necessary to have contrasts that are > orthogonal in > > the > > row-basis of the design, such as contr.sum(), contr.helmert(), > or > > contr.poly(). The "type-II" tests, however, are insensitive to > the > > contrast > > parametrization. Anova() always uses an orthogonal > parametrization > for > > the > > within-subjects design. > > > > The general advice in ?Anova is, "Be very careful in > formulating the > > model > > for type-III tests, or the hypotheses tested will not make > sense." > > > > Thanks, Peter, for pointing this out. > > > > > > John > > > > ------------------------------ > > John Fox, Professor > > Department of Sociology > > McMaster University > > Hamilton, Ontario, Canada > > web: socserv.mcmaster.ca/jfox > > > > > > > -----Original Message----- > > > > > From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] > > > Sent: January-24-09 6:31 PM > > > To: Nils Skotara > > > Cc: John Fox; r-help@r-project.org; 'Michael Friendly' > > > Subject: Re: [R] Anova and unbalanced designs > > > > > > > > Nils Skotara wrote: > > > > Dear John, > > > > > > > > thank you again! You replicated the type III result I got > in > SPSS! > > When > > I > > > > calculate Anova() type II: > > > > > > > > Univariate Type II Repeated-Measures ANOVA Assuming > Sphericity > > > > > > > > SS num Df Error SS den Df F > Pr(>F) > > > > between 4.8000 1 9.0000 8 4.2667 > 0.07273 . > > > > within 0.2000 1 10.6667 8 0.1500 > 0.70864 > > > > between:within 2.1333 1 10.6667 8 1.6000 > 0.24150 > > > > --- > > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' > ' 1 > > > > > > > > I see the exact same values as you had written. > > > > However, and now I am really lost, type III (I did not > change > > anything > > > else) > > > > leads to the following: > > > > > > > > Univariate Type III Repeated-Measures ANOVA Assuming > Sphericity > > > > > > > > SS num Df Error SS den Df > F > > Pr(>F) > > > > (Intercept) 72.000 1 9.000 8 > 64.0000 > > 4.367e-05 > > > *** > > > > between 4.800 1 9.000 8 > 4.2667 > > 0.07273 . > > > > as.factor(within) 2.000 1 10.667 8 > 1.5000 > > 0.25551 > > > > between:as.factor(within) 2.133 1 10.667 8 > 1.6000 > > 0.24150 > > > > --- > > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' > ' 1 > > > > > > > > How is this possible? > > > > > > This looks like a contrast parametrization issue: If we look > at > the > > > per-group mean within-differences and their SE, we get > > > > > > > summary(lm(within1-within2~between - 1)) > > > .. > > > Coefficients: > > > Estimate Std. Error t value Pr(>|t|) > > > between1 -1.0000 0.8165 -1.225 0.256 > > > between2 0.3333 0.6667 0.500 0.631 > > > .. > > > > table(between) > > > between > > > 1 2 > > > 4 6 > > > > > > Now, the type II F test is based on weighting the two means > as you > > would > > > after testing for no interaction > > > > > > > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2) > > > [1] 0.1500205 > > > > > > and type III is to weight them as if there had been equal > counts > > > > > > > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2) > > > [1] 0.400022 > > > > > > However, the result above corresponds to looking at group1 > only > > > > > > > (-1)^2/(0.8165^2) > > > [1] 1.499987 > > > > > > It helps if you choose orhtogonal contrast parametrizations: > > > > > > > options(contrasts=c("contr.sum","contr.helmert")) > > > > betweenanova <- lm(values ~ between)> Anova(betweenanova, > > idata=with, > > > idesign= ~as.factor(within), type = "III" ) > > > > > > Type III Repeated Measures MANOVA Tests: Pillai test > statistic > > > Df test stat approx F num Df den > Df > > Pr(>F) > > > (Intercept) 1 0.963 209.067 1 8 > 5.121e- > > 07 > > *** > > > between 1 0.348 4.267 1 8 > > 0.07273 . > > > as.factor(within) 1 0.048 0.400 1 8 > > 0.54474 > > > between:as.factor(within) 1 0.167 1.600 1 8 > > 0.24150 > > > --- > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' > 1 > > > > > > > > > > > > > > > -- > > > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade > 5, > Entr.B > > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 > Cph. K > > > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) > > 35327918 > > > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: > (+45) > > 35327907 > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R- > project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible > code. > > > > > > > > > > > > -- > > ---------------------------------------------- > > > > > > My contact information: > > Tal Galili > > Phone number: 972-50-3373767 > > FaceBook: Tal Galili > > My Blogs: > > www.talgalili.com > > www.biostatistics.co.il > > > > > > > > > > > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > www.talgalili.com > www.biostatistics.co.il > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.