Jos, See ?relevel for information on how to reorder the levels of a factor, while being able to specify the reference level.
Basically, the first level of the factor is taken as the reference. If you want to utilize a different ordering, as an alternative to the above, simply use: AGE <- factor(AGE, levels = c(FirstLevel, SecondLevel, ...) BTW, you might want to review Frank Harrell's page on why categorizing a continuous variable is not a good idea: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous HTH, Marc Schwartz on 01/19/2009 09:52 AM Jos Elkink wrote: > Hi Thierry, > > Thanks for your quick answer. The problem is not so much the LABOUR > variable, however, but the AGE variable, which consists of about 5 > categories for which I do indeed not create separate dummy variables. > But R does not behave as expected when deciding on which dummy to use > as reference category ... > > Jos > > On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry > <thierry.onkel...@inbo.be> wrote: >> Dear Jos, >> >> In R you don't need to create you own dummy variables. Just create a >> factor variable LABOUR (with two levels) and rerun your model. Then you >> should be able to calculate all coefficients. >> >> HTH, >> >> Thierry >> >> ------------------------------------------------------------------------ >> ---- >> ir. Thierry Onkelinx >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >> and Forest >> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, >> methodology and quality assurance >> Gaverstraat 4 >> 9500 Geraardsbergen >> Belgium >> tel. + 32 54/436 185 >> thierry.onkel...@inbo.be >> www.inbo.be >> >> To call in the statistician after the experiment is done may be no more >> than asking him to perform a post-mortem examination: he may be able to >> say what the experiment died of. >> ~ Sir Ronald Aylmer Fisher >> >> The plural of anecdote is not data. >> ~ Roger Brinner >> >> The combination of some data and an aching desire for an answer does not >> ensure that a reasonable answer can be extracted from a given body of >> data. >> ~ John Tukey >> >> -----Oorspronkelijk bericht----- >> Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> Namens Jos Elkink >> Verzonden: maandag 19 januari 2009 15:16 >> Aan: r-help@r-project.org >> Onderwerp: [R] reference category for factor in regression >> >> Hi all, >> >> I am struggling with a strange issue in R that I have not encountered >> before and I am not sure how to resolve this. >> >> The model looks like this, with all irrelevant variables left out: >> >> LABOUR - a dummy variable >> NONLABOUR = 1 - LABOUR >> AGE - a categorical variable / factor >> VOTE - a dummy variable >> >> glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE, >> family=binomial(link="logit")) >> >> In other words, a standard interaction model, but I want to know the >> intercepts and coefficients for each of the two cases (LABOUR and >> NONLABOUR), instead of getting coefficients for the differences as in >> a normal interaction model. >> >> But the strange thing is, for the two occurances of the AGE variable, >> it makes a different choice as to which AGE category to leave out of >> the regression. The cross-table of AGE with LABOUR does not have empty >> cells. >> >> Anyone any idea what might be going wrong? Or what I could do about >> this? >> >> Thanks in advance for any help! >> >> Regards, >> >> Jos >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.