Hi all, Thanks for the advice.
> See ?relevel for information on how to reorder the levels of a factor, > while being able to specify the reference level. > Basically, the first level of the factor is taken as the reference. Yes, that is how I always used it. But the problem is, in this particular regression R does *not* take the first level as reference. In fact, AGE appears twice in the same regression (two different interactions) and in one case it selects the 1st category and in another case a different one. > BTW, you might want to review Frank Harrell's page on why categorizing a > continuous variable is not a good idea: I most certainly agree, but the categorisation has been imposed in the survey itself, so it is all the data I have. I did not design the questions :-) ... Thanks for this reference, though, as it is certainly interesting to inform my teaching. > str(AGE) Factor w/ 5 levels "65+","18-24",..: 5 5 1 4 5 5 2 4 1 3 ... So I expect 65+ to be the reference category, but it is not. Here is a little bit more R code to show the problem: > str(AGE) Factor w/ 5 levels "65+","18-24",..: 5 5 1 4 5 5 2 4 1 3 ... > table(LABOUR) LABOUR 0 1 692 1409 > NONLABOUR <- 1 - LABOUR > m <- glm(NOVOTE ~ 0 + LABOUR + NONLABOUR + AGE : LABOUR + AGE : NONLABOUR, > family=binomial) > m Call: glm(formula = NOVOTE ~ 0 + LABOUR + NONLABOUR + AGE:LABOUR + AGE:NONLABOUR, family = binomial) Coefficients: LABOUR NONLABOUR LABOUR:AGE65+ LABOUR:AGE18-24 -0.35110 -0.30486 -0.11890 -0.66444 LABOUR:AGE25-34 LABOUR:AGE35-49 LABOUR:AGE50-64 NONLABOUR:AGE18-24 -0.23893 -0.15860 NA -0.65655 NONLABOUR:AGE25-34 NONLABOUR:AGE35-49 NONLABOUR:AGE50-64 -0.72815 0.04951 0.17481 As you can see, 65+ is taken as reference category in the interaction with NONLABOUR, but not in the interaction with LABOUR. I know glm(NOVOTE ~ LABOUR * AGE, family=binomial) would be a more conventional specification, but the above should be equivalent and should give me the coefficients and standard errors for the two groups (LABOUR and NONLABOUR) separately, rather than for the difference / interaction term). Perhaps the NA in the above output (which I only notice now) is a hint at the problem, but I am not sure why that occurs. > table(m$model$AGE, m$model$LABOUR, m$model$NOVOTE) , , = 0 0 1 65+ 137 24 18-24 68 127 25-34 59 267 35-49 71 298 50-64 82 179 , , = 1 0 1 65+ 101 15 18-24 26 46 25-34 21 148 35-49 55 179 50-64 72 126 Anyone any idea? So there must be a reason R decides *not* to use 65+ as reference in that particular scenario, and I am missing why. Jos ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.