G'day Jos, On Mon, 19 Jan 2009 20:22:10 +0000 Jos Elkink <jos.elk...@ucd.ie> wrote:
> Here is a little bit more R code to show the problem: Thanks for that, all becomes clear now. :) > > str(AGE) > Factor w/ 5 levels "65+","18-24",..: 5 5 1 4 5 5 2 4 1 3 ... > > table(LABOUR) > LABOUR > 0 1 > 692 1409 > > NONLABOUR <- 1 - LABOUR > > m <- glm(NOVOTE ~ 0 + LABOUR + NONLABOUR + AGE : LABOUR + AGE : NONLABOUR, > > family=binomial) > > m > > Call: glm(formula = NOVOTE ~ 0 + LABOUR + NONLABOUR + AGE:LABOUR + > AGE:NONLABOUR, family = binomial) > > Coefficients: > LABOUR NONLABOUR LABOUR:AGE65+ LABOUR:AGE18-24 > -0.35110 -0.30486 -0.11890 -0.66444 > LABOUR:AGE25-34 LABOUR:AGE35-49 LABOUR:AGE50-64 NONLABOUR:AGE18-24 > -0.23893 -0.15860 NA -0.65655 > NONLABOUR:AGE25-34 NONLABOUR:AGE35-49 NONLABOUR:AGE50-64 > -0.72815 0.04951 0.17481 > > As you can see, 65+ is taken as reference category in the interaction > with NONLABOUR, but not in the interaction with LABOUR. The way R translates model formulae into design matrices can appear as a somewhat mysterious process. It is claimed that the best account on how this is done can be found in the MASS book. Essentially, if A is a factor with r levels, then if A appears as a main term in the model it will contribute X_A (the matrix with r columns that contains the dummy variables corresponding to the levels of A) to the design matrix. If A appears in a two-way interaction term, then the contribution to the design matrix is the matrix obtained by taking each column of X_A and multiplying it elementwise with each column of the matrix contributed by the other variable. And so forth for higher order interactions. Now, in general adding X_A to the design matrix will lead to a design matrix that does not have full rank. Thus, typically X_A is first multiplied by a contrast matrix, say, C_A and X_A C_A is actually added to the design matrix. The contrast matrix used depends on your settings of options()$contrasts. Likewise X_A C_A is typically used in the construction of the interaction terms and not X_A. But if you play around with "+0" (or "-1"), then things seem to become more complicated. E.g. if +0 is in the model, then for the first factor that is added as a main term, X_A is put into the design matrix, not X_A C_A. But a second factor, say, B would contribute X_B C_B to the design matrix since adding X_B would lead again to a design matrix that does not have full column rank; so some reduction is done immediately. MASS does not really discuss the situation when you use two (apparently) quantitative variables in your model and one factor and the factor only appears in the interaction terms (well, at least not the edition of MASS that I have currently in front of me, which is an older one; my copy of the most recent edition is in a colleague's office), but it appears plausible that the logic of how the design matrix is constructed uses X_A in the AGE:LABOUR interaction and X_A C_A in the AGE:NONLABOUR interaction. Or there is a subtle error in logic in how the design matrix is constructed. Of course due to the columns contributed by LABOUR and AGE:LABOUR, the design matrix has no longer full rank. This is noticed while the model is fitted and, hence, the LABOUR:AGE50-64 parameter is not estimable and a NA is returned. BTW, if you use glm(NOVOTE ~ LABOUR + NONLABOUR + AGE : LABOUR + AGE : NONLABOUR, family=binomial) Then the interaction terms are as you expect (if I understand you correctly). But now, of course, the NONLABOUR parameter is not estimable. > I know glm(NOVOTE ~ LABOUR * AGE, family=binomial) would be a more > conventional specification, but the above should be equivalent and > should give me the coefficients and standard errors for the two groups > (LABOUR and NONLABOUR) separately, rather than for the difference / > interaction term). If I understand you correctly, I would do the following to get a fitted model that directly gives you the estimates and standard errors that you are interested in: 1) Turn LABOUR into a factor to have R do all the work with creating the design matrix: R> FLABOUR <- factor(LABOUR, labels=c("NO", "YES")) 2) Use any of the following commands: R> glm(NOVOTE ~ FLABOUR/AGE - 1, family=binomial) R> glm(NOVOTE ~ FLABOUR/AGE + 0, family=binomial) R> glm(NOVOTE ~ FLABOUR + FLABOUR:AGE - 1, family=binomial) R> glm(NOVOTE ~ FLABOUR + FLABOUR:AGE + 0, family=binomial) Somehow, I have a preference of using "-1" instead of "+0", but that does not really matter; it also does not matter whether these appear at the beginning or the end (or the middle of the formula). The first two forms are, of course, equivalent to the latter two due to the way "/" is interpreted. I prefer the first version (less typing), but the third or fourth version might be clearer to others. HTH. Cheers, Berwin =========================== Full address ============================= Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.