On Fri, 6 Jun 2014 11:16:11 AM Nwinters wrote: > I have a variable coded in Stata as follows: > ** > *gen sat_pm25cat_=. > replace sat_pm25cat_= 1 if (sat_pm25>=4 & sat_pm25<=7.1 & sat_pm25!=.) > replace sat_pm25cat_= 2 if (sat_pm25>=7.1 & sat_pm25<=10) > replace sat_pm25cat_= 3 if (sat_pm25>=10.1 & sat_pm25<=11.3) > replace sat_pm25cat_= 4 if (sat_pm25>=11.4 & sat_pm25<=12.1) > replace sat_pm25cat_= 5 if (sat_pm25>=12.2 & sat_pm25<=17.1) > > gen satpm25catR= "A" if sat_pm25cat_==1 > replace satpm25catR= "B" if sat_pm25cat_==2 > replace satpm25catR= "C" if sat_pm25cat_==3 > replace satpm25catR= "D" if sat_pm25cat_==4 > replace satpm25catR= "E" if sat_pm25cat_==5 > *** > > my model for R is: > ## > *glm.PM25linB <-glm(leuk ~ satpm25catR + sex + ageR, data=leuk, > family=binomial, epsilon=1e-15, maxit=1000)* > ## > > In the summary, satpm25catR is being reported as all levels: > > <http://r.789695.n4.nabble.com/file/n4691823/Screen_Shot_2014-06-06_at_2.png > > > > *What I want is to make "A" the reference level, how do I do this??*
Hi Nwinters, I get what you want with this example: leukdf<- data.frame(leuk=sample(0:1,100,TRUE),sat_pm25=runif(100,0,17.1), sex=sample(c("M","F"),100,TRUE),ageR=sample(20:75,100,TRUE)) leukdf$satpm25catR<-factor(NA,levels=LETTERS[1:5]) leukdf$satpm25catR<-factor(rep(NA,100),levels=LETTERS[1:5]) leukdf$satpm25catR[leukdf$sat_pm25 < 7.1]<-"A" leukdf$satpm25catR[leukdf$sat_pm25 >= 7.1 & leukdf$sat_pm25 < 10.1]<-"B" leukdf$satpm25catR[leukdf$sat_pm25 >= 10.1 & leukdf$sat_pm25 < 11.3]<-"C" leukdf$satpm25catR[leukdf$sat_pm25 >= 11.3 & leukdf$sat_pm25 < 12.1]<-"D" leukdf$satpm25catR[leukdf$sat_pm25 >= 12.1 & leukdf$sat_pm25 < 17.1]<-"E" summary(glm(leuk ~ satpm25catR + sex + ageR, data=leukdf, family=binomial, epsilon=1e-15, maxit=1000)) Call: glm(formula = leuk ~ satpm25catR + sex + ageR, family = binomial, data = leukdf, epsilon = 1e-15, maxit = 1000) Deviance Residuals: Min 1Q Median 3Q Max -1.4813 -1.1798 0.7631 1.1347 1.5195 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.67565 0.87205 1.922 0.0547 satpm25catRB -0.52289 0.58578 -0.893 0.3721 satpm25catRC -0.79998 0.78405 -1.020 0.3076 satpm25catRD -0.36488 0.88162 -0.414 0.6790 satpm25catRE -0.65372 0.51461 -1.270 0.2040 sexM -0.54063 0.42073 -1.285 0.1988 ageR -0.02095 0.01455 -1.440 0.1500 (Dispersion parameter for binomial family taken to be 1) Null deviance: 138.59 on 99 degrees of freedom Residual deviance: 133.74 on 93 degrees of freedom AIC: 147.74 Number of Fisher Scoring iterations: 5 It may be a problem with the way you have calculated the categorical variable as David noted. However, if you haven't read a paper I had published a few years ago titled "On the perils of categorizing responses", you might want to have a look. Jim ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.