[R] Understanding linear contrasts in Anova using R

David Howell Wed, 29 Sep 2010 12:42:51 -0700

 #I am trying to understand how R fits models for contrasts in a
#simple one-way anova. This is an example, I am not stupid enough to want
#to simultaneously apply all of these contrasts to real data. With a few
#exceptions, the tests that I would compute by hand (or by other software)

#will give the same t or F statistics. It is the contrast estimates thatR produces

#that I can't seem to understand.
#

# In searching for answers to this problem, I found a great PowerPointslide (I think by John Fox).# The slide pointed to the coefficients, said something like "these arecoeff. that no one could love," and#then suggested looking at the means to understand where they came from.I have stared

# and stared at his means and then my means, but can't find a relationship.


# The following code and output illustrates the problem.

# Various examples of Anova using R

dv <- c(1.28, 1.35, 3.31, 3.06, 2.59, 3.25, 2.98, 1.53, -2.68,2.64, 1.26, 1.06,-1.18, 0.15, 1.36, 2.61, 0.66, 1.32, 0.73, -1.06, 0.24,0.27, 0.72, 2.28,-0.41, -1.25, -1.33, -0.47, -0.60, -1.72, -1.74, -0.77, -0.41,-1.20, -0.31, -0.74,-0.45, 0.54, -0.98, 1.68, 2.25, -0.19, -0.90, 0.78, 0.05,2.69, 0.15, 0.91,2.01, 0.40, 2.34, -1.80, 5.00, 2.27, 6.47, 2.94, 0.47,3.22, 0.01, -0.66)


group <- factor(rep(1:5, each = 12))


# Use treatment contrasts to compare each group to the first group.
options(contrasts = c("contr.treatment","contr.poly"))  # The default
model2 <- lm(dv ~ group)
summary(model2)
  # Summary table is the same--as it should be
  # Intercept is Group 1 mean and other coeff. are deviations from that.
  # This is what I would expect.
  #summary(model1)
  #              Df Sum Sq Mean Sq F value    Pr(>F)
  #  group        4  62.46 15.6151  6.9005 0.0001415 ***
  #  Residuals   55 124.46  2.2629
  #Coefficients:
  #            Estimate Std. Error t value Pr(>|t|)
  #(Intercept)  1.80250    0.43425   4.151 0.000116 ***
  #group2      -1.12750    0.61412  -1.836 0.071772 .
  #group3      -2.71500    0.61412  -4.421 4.67e-05 ***
  #group4      -1.25833    0.61412  -2.049 0.045245 *
  #group5       0.08667    0.61412   0.141 0.888288


# Use sum contrasts to compare each group against grand mean.
options(contrasts = c("contr.sum","contr.poly"))
model3 <- lm(dv ~ group)
summary(model3)

# Again, this is as expected. Intercept is grand mean and others aredeviatoions from that.

  #Coefficients:
  #              Estimate Std. Error t value Pr(>|t|)
  #  (Intercept)   0.7997     0.1942   4.118 0.000130 ***
  #  group1        1.0028     0.3884   2.582 0.012519 *
  #  group2       -0.1247     0.3884  -0.321 0.749449
  #  group3       -1.7122     0.3884  -4.408 4.88e-05 ***
  #  group4       -0.2555     0.3884  -0.658 0.513399

#SO FAR, SO GOOD

# IF I wanted polynomial contrasts BY HAND I would use

# a(i) = -2 -1 0 1 2 for linear contrast (or somelinear function of this )

#    Effect = Sum(a(j)M(i))    # where M = mean

# Effect(linear) = -2(1.805) -1(0.675) +0(-.912) +1(.544) +2(1.889) =0.043

#    SS(linear) = n*(Effect(linear)^2)/Sum((a(j)^2))  = 12(.043)/10 = .002
#    F(linear) = SS(linear)/MS(error) = .002/2.263 = .001
#    t(linear) = sqrt(.001) = .031

# To do this in R I would use
order.group <- ordered(group)
model4 <- lm(dv~order.group)
summary(model4)
#  This gives:
    #Coefficients:
#                  Estimate Std. Error t value Pr(>|t|)
#    (Intercept)    0.79967    0.19420   4.118 0.000130 ***
#    order.group.L  0.01344    0.43425   0.031 0.975422
#    order.group.Q  2.13519    0.43425   4.917 8.32e-06 ***
#    order.group.C  0.11015    0.43425   0.254 0.800703
#    order.group^4 -0.79602    0.43425  -1.833 0.072202 .

# The t value for linear is same as I got (as are others) but I don'tunderstand# the estimates. The intercept is the grand mean, but I don't see therelationship

# of other estimates to that or to the ones I get by hand.

# My estimates are the sum of (coeff times means) i.e. 0 (intercept),.0425, 7.989, .3483, -6.66

# and these are not a linear (or other nice pretty) function of est. from R.

# Just as a check, I forced intercept through zero with with deviationscores or -1 in model.

#  Now force intercept to 0 by using deviation scores

devdv <- dv-mean(dv)
model5 <- lm(devdv~order.group)
summary(model5)
#Same as above except intercept = 0

# Now do it by removing the intercept
model6 <- lm(dv~order.group -1)
summary(model6)
#  Weird because coeff = cell means
    #             Estimate Std. Error t value Pr(>|t|)
#    order.group1   1.8025     0.4342   4.151 0.000116 ***
#    order.group2   0.6750     0.4342   1.554 0.125824
#    order.group3  -0.9125     0.4342  -2.101 0.040208 *
#    order.group4   0.5442     0.4342   1.253 0.215464
#    order.group5   1.8892     0.4342   4.350 5.94e-05 ***

# BUT note that row labels would suggest that we were not solving forpolynomials,

# but the contrast matrix is made up of polynomial coefficients.

# contrasts(order.group)
#                .L         .Q            .C         ^4
#[1,] -6.324555e-01  0.5345225 -3.162278e-01  0.1195229
#[2,] -3.162278e-01 -0.2672612  6.324555e-01 -0.4780914
#[3,] -3.287978e-17 -0.5345225  1.595204e-16  0.7171372
#[4,]  3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
#[5,]  6.324555e-01  0.5345225  3.162278e-01  0.1195229

#So, when I use polynomials (either through contr.poly or by supplying amatrix of

#coefficients, what do the estimates represent?

# My problem is not restricted to polynomials. If I try a set oforthogonal linear contrasts

# on group means I get

#contrasts(group) <- cbind(c(1,1,0,-1,-1), c(1,-1,0,0,0),c(-0,0,0,1,-1), c(1,1,4,-1,1))

#model3 <- lm(dv ~ group)
#summary(model3)
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)   1.5335     0.2558   5.995 1.64e-07 ***
#group1        0.3168     0.2279   1.390 0.170185
#group2        0.5638     0.3071   1.836 0.071772 .
#group3       -1.2840     0.3369  -3.811 0.000352 ***
#group4       -0.6115     0.1387  -4.408 4.88e-05 ***

#These are not even close to what I would expect. By hand I wouldcompute the contrasts as

# .0442, 1.1275, 1.3450, and 8.5608 with different t values.

# Any help would be appreciated.

Thanks,
Dave Howell

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Understanding linear contrasts in Anova using R

Reply via email to