I wonder if someone could explain the behavior of the anova() and lm() functions in the following situation:
I have a standard 3x2 factorial design, factorA has 3 levels, factorB has 2 levels, they are fully crossed. I have a dependent variable DV. Of course I can do the following to get the usual anova table: > anova(lm(DV~factorA+factorB+factorA:factorB)) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(>F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18.0000 0.7500 This is perfectly satisfactory for my situation, but as a pedagogical exercise, I wanted to demonstrate the model comparison approach to analysis of variance by using anova() to compare a full model that contains all effects, to restricted models that contain all effects save for the effect of interest. The test of the interaction effect seems to be as I expected: > fullmodel<-lm(DV~factorA+factorB+factorA:factorB) > restmodel<-lm(DV~factorA+factorB) > anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorA + factorB Res.Df RSS Df Sum of Sq F Pr(>F) 1 24 18.0000 2 26 27.8667 -2 -9.8667 6.5778 0.005275 ** As you can see the value of F (6.5778) is the same as in the anova table above. All is well. However, if I try to test a main effect, e.g. factorA, by testing the full model against a restricted model that doesn't contain the main effect factorA, I get something strange: > restmodel<-lm(DV~factorB+factorA:factorB) > anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorB + factorA:factorB Res.Df RSS Df Sum of Sq F Pr(>F) 1 24 18 2 24 18 0 0 upon inspection of each model I see that the Residuals are identical, which is not what I was expecting: > anova(fullmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(>F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18.0000 0.7500 This looks fine, but then the restricted model is where things are not as I expected: > anova(restmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(>F) factorB 1 2.1333 2.1333 2.8444 0.104648 factorB:factorA 4 17.3333 4.3333 5.7778 0.002104 ** Residuals 24 18.0000 0.7500 I was expecting the Residuals in the restricted model (the one not containing main effect of factorA) to be larger than in the full model containing all three effects. In other words, the variance accounted for by the main effect factorA should be added to the Residuals. Instead, it looks like the variance accounted for by the main effect of factorA is being soaked up by the factorA:factorB interaction term. Strangely, the degrees of freedom are also affected. I must be misunderstanding something here. Can someone point out what is happening? Thanks, -Paul -- Paul L. Gribble, Ph.D. Associate Professor Dept. Psychology The University of Western Ontario London, Ontario Canada N6A 5C2 Tel. +1 519 661 2111 x82237 Fax. +1 519 661 3961 pgrib...@uwo.ca http://gribblelab.org [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.