Thank you, Dr Ripley. After some false starts and consulting MASS2, Chambers&Hastie and the help files, this worked acceptably.
> xxx$issuecat2<-C(xxx$issuecat2,poly,1) > attr(xxx$issuecat2,"contrasts") .L 0-39 -6.324555e-01 40-49 -3.162278e-01 50-59 -3.287978e-17 60-69 3.162278e-01 70+ 6.324555e-01 > exp.mdl<-glm(actual~gendercat+issuecat2+smokecat, data=xxx,family="poisson",offset=expected) > summary(exp.mdl) Deviance Residuals: Min 1Q Median 3Q Max -0.5596 -0.2327 -0.1671 -0.1199 5.2386 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.57125 0.06650 -68.743 < 2e-16 *** gendercatMale 0.29660 0.06426 4.615 3.92e-06 *** issuecat2.L 2.09161 0.09354 22.360 < 2e-16 *** smokecatSmoker 0.22178 0.07870 2.818 0.00483 ** smokecatUnknown 0.02378 0.08607 0.276 0.78233 The reference category is different, but the effect of a one category increase in age-decade on the log(rate) is(2.09*0.316) = 0.6604 which seems acceptable agreement with my earlier as.numeric(factor) estimate of 0.6614. -- David Winsemius Prof Brian Ripley <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Further to Duncan's comments, you can control factor codings via > options(contrasts=), by setting contrasts() on the factor and via > C(). This does enable you to code an ordered factor as a linear > term, for example. > > The only place I know that this is discussed in any detail is in > Bill Venables' account in MASS chapter 6. > > On Sat, 5 Jan 2008, Duncan Murdoch wrote: > >> On 05/01/2008 7:16 PM, David Winsemius wrote: >>> David Winsemius <[EMAIL PROTECTED]> wrote in >>> news:[EMAIL PROTECTED]: >>> >>>> I have a variable which is roughly age categories in decades. In >>>> the original data, it came in coded: >>>>> str(xxx) >>>> 'data.frame': 58271 obs. of 29 variables: >>>> $ issuecat : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1 1 >>>> 1... >>>> snip >>>> >>>> I then defined issuecat as ordered: >>>>> xxx$issuecat<-as.ordered(xxx$issuecat) >>>> When I include issuecat in a glm model, the result makes me think >>>> I have asked R for a linear+quadratic+cubic+quartic polynomial >>>> fit. The results are not terribly surprising under that >>>> interpretation, but I was hoping for only a linear term (which I >>>> was taught to call a "test of trend"), at least as a starting >>>> point. >>>> >>>>> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson") >>>>> summary(age.mdl) >>>> Call: >>>> glm(formula = actual ~ issuecat, family = "poisson", data = xxx) >>>> >>>> Deviance Residuals: >>>> Min 1Q Median 3Q Max >>>> -0.3190 -0.2262 -0.1649 -0.1221 5.4776 >>>> >>>> Coefficients: >>>> Estimate Std. Error z value Pr(>|z|) >>>> (Intercept) -4.31321 0.04865 -88.665 <2e-16 *** >>>> issuecat.L 2.12717 0.13328 15.960 <2e-16 *** >>>> issuecat.Q -0.06568 0.11842 -0.555 0.579 >>>> issuecat.C 0.08838 0.09737 0.908 0.364 >>>> issuecat^4 -0.02701 0.07786 -0.347 0.729 >>>> >>>> This also means my advice to a another poster this morning may >>>> have been misleading. I have tried puzzling out what I don't >>>> understand by looking at indices or searching in MASSv2, the Blue >>>> Book, Thompson's application of R to Agresti's text, and the FAQ, >>>> so far without success. What I would like to achieve is having >>>> the lowest age category be a reference category (with the >>>> intercept being the log-rate) and each succeeding age category >>>> be incremented by 1. The linear estimate would be the >>>> log(risk-ratio) for increasing ages. I don't want the higher >>>> order polynomial estimates. Am I hoping for too much? >>>> >>> >>> I acheived what I needed by: >>> >>>> xxx$agecat<-as.numeric(xxx$issuecat) >>>> xxx$agecat<-xxx$agecat-1 >>> >>> The results look quite sensible: >>>> exp.mdl<-glm(actual~gendercat+agecat+smokecat, data=xxx, >>> family="poisson", offset=expected) >>>> summary(exp.mdl) >>> >>> Call: >>> glm(formula = actual ~ gendercat + agecat + smokecat, family = >>> "poisson", >>> data = xxx, offset = expected) >>> >>> Deviance Residuals: >>> Min 1Q Median 3Q Max >>> -0.5596 -0.2327 -0.1671 -0.1199 5.2386 >>> >>> Coefficients: >>> Estimate Std. Error z value Pr(>|z|) >>> (Intercept) -5.89410 0.11009 -53.539 < 2e-16 *** >>> gendercatMale 0.29660 0.06426 4.615 3.92e-06 *** >>> agecat 0.66143 0.02958 22.360 < 2e-16 *** >>> smokecatSmoker 0.22178 0.07870 2.818 0.00483 ** >>> smokecatUnknown 0.02378 0.08607 0.276 0.78233 >>> >>> I remain curious about how to correctly control ordered factors, >>> or I should just simply avoid them. >> >> If you're using a factor, R generally assumes you mean each level >> is a different category, so you get levels-1 parameters. If you >> don't want this, you shouldn't use a factor: convert to a numeric >> scale, just as you did. >> >> Duncan Murdoch >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.