Dear R community, I am new to R, a reforming SAS user :) I am running R 2.10.1 on a Windows XP machine. I would like to write linear functions of my coefficient parameter estimates from a glm, but am having a difficult time understanding the parameterization R uses. In the toy example below I am running a glm on binomial data, with clones and lines within clones as fixed effects, each with 6 replicates. I cannot figure out the algorithm R uses for determining which combination of levels is used as the reference. Crawley, in the chapter on Statistical Modeling in the R Book states "The writers of R agree that treatment contrasts represent the best solution. This method does away with parameter a, the overall mean. The mean of the factor level that comes first in the alphabet (control, in our example) is promoted to pole position, and the other effects are shown as differences (contrasts) between this mean and the other four factor level means."
This pattern seems to hold for full factorials, but it doesn't appear to work with this example in which Line is nested within Clone. In this example, R appears to use the last line in alphabet order of the first clone (Clone 1) as the intercept. Thereafter, the reference levels for Clones are the last lines of Clones 2 and 3, but the first line (Line 1) of Clone 4. The first line of Clone 4, is also the first line over all lines. Can someone please explain what process R uses to determine the parameterization of this model? Of course, I know what the parameterization is in this model and can write the functions I need. However, I am trying to understand the algorithm R uses to determine the parameterization. This is only a toy example of a much larger study and I would like to know when I should use the last level as the reference and when I should use the first. Many thanks, Manuela tmp <- data.frame(Cl=rep(1:4,c(12,18,18,18)), L =rep(c(91,104,14,91,96,"84-1",96,116,1,9,14),each=6), N =rep(10,(12+18*3)), D =c(1,6,8,1,1,1,2,6,10,3,3,1,2,1,1,0,4,4,6,5,3,5,1,3, 1,6,8,5,5,3,5,2,1,0,4,5,1,0,2,3,6,7,4,0,2,5,3,8,1, 4,7,0,6,3,7,2,3,6,1,9,7,2,1,3,0,1) ) tmp$N[c(13,15,59)] <- c(8,9,9) # not always 10 trials tmp$Cl <- factor(tmp$Cl) # Make sure clone and line within clone are factors tmp$L <- factor(tmp$L) model <- formula(cbind(D,(N-D))~ Cl/L) tmp.glm<-glm(data=tmp,formula=model, family=quasibinomial(link="logit")) with(tmp,table(L,Cl)) summary.glm(tmp.glm)$coefficients unique(tmp[order(tmp$L),1:2]) # The coeff estimates R gives are in this order, # but R is using the rows c(1,8,10,11) as reference # Why? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.