Dear R community,

I am new to R, a reforming SAS user :)  I am running R 2.10.1 on a Windows XP 
machine.   I would like to write linear functions of my coefficient parameter 
estimates from a glm, but am having a difficult time understanding the 
parameterization R uses.   In the toy example below I am running a glm on 
binomial data, with clones and lines within clones as fixed effects, each with 
6 replicates.  I cannot figure out the algorithm R uses for determining which 
combination of levels is used as the reference.  Crawley, in the chapter on 
Statistical Modeling in the R Book states "The writers of R agree that 
treatment contrasts represent the best solution.  This method does away with 
parameter a, the overall mean. The mean of the factor level that comes first in 
the alphabet (control, in our example) is promoted to pole position, and the 
other effects are shown as differences (contrasts) between this mean and the 
other four factor level means."

This pattern seems to hold for full factorials, but it doesn't appear to work 
with this example in which Line is nested within Clone.  In this example, R 
appears to use the last line in alphabet order of the first clone (Clone 1) as 
the intercept.  Thereafter, the reference levels for Clones are the last lines 
of Clones 2 and 3, but the first line (Line 1) of Clone 4.    The first line of 
Clone 4, is also the first line over all lines.  Can someone please explain 
what process R uses to determine the parameterization of this model?  Of 
course, I know what the parameterization is in this model and can write the 
functions I need.  However, I am trying to understand the algorithm R uses to 
determine the parameterization.  This is only a toy example of a much larger 
study and I would like to know when I should use the last level as the 
reference and when I should use the first.

Many thanks,
Manuela

tmp <- data.frame(Cl=rep(1:4,c(12,18,18,18)),
                   L =rep(c(91,104,14,91,96,"84-1",96,116,1,9,14),each=6),
                   N =rep(10,(12+18*3)),
                   D =c(1,6,8,1,1,1,2,6,10,3,3,1,2,1,1,0,4,4,6,5,3,5,1,3,
                        1,6,8,5,5,3,5,2,1,0,4,5,1,0,2,3,6,7,4,0,2,5,3,8,1,
                        4,7,0,6,3,7,2,3,6,1,9,7,2,1,3,0,1) )
tmp$N[c(13,15,59)] <- c(8,9,9)  # not always 10 trials
tmp$Cl <- factor(tmp$Cl)            # Make sure clone and line within clone are 
factors
tmp$L  <- factor(tmp$L)

model <- formula(cbind(D,(N-D))~ Cl/L)
tmp.glm<-glm(data=tmp,formula=model, family=quasibinomial(link="logit"))

with(tmp,table(L,Cl))
summary.glm(tmp.glm)$coefficients

unique(tmp[order(tmp$L),1:2])
# The coeff estimates R gives are in this order,
#   but R is using the rows c(1,8,10,11) as reference
# Why?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to