[I'm a little confused: are you "Sam Smith" or "Chris Mcowen" ... ?]
This is admittedly a bit confusing, but the best scale on which to compute standard errors is the link scale. It turns out (I hadn't realized this) that predict.glm does give you not-crazy answers when you ask for se.fit=TRUE on the response scale, in which case you can (a) just take +/- 1.96 SE around the response-scale fit and be done. However, these are less accurate than doing what I suggested ((b) computing fit +/- 1.96 SE on the link scale and inverse-link-transforming). Among other things, method "a" produces confidence intervals that are exactly symmetric (generally not true) and that are not necessarily bounded within the appropriate range -- for example, the lower 95% CI bound for females in the example below is slightly < 0 ... ## extended example from ?predict.glm require(graphics) ## example from Venables and Ripley (2002, pp. 190-2.) ldose <- rep(0:5, 2) numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16) sex <- factor(rep(c("M", "F"), c(6, 6))) SF <- cbind(numdead, numalive=20-numdead) budworm.lg <- glm(SF ~ sex*ldose, family=binomial) summary(budworm.lg) ci.scale <- 1.96 plot(c(1,32), c(0,1), type = "n", xlab = "dose", ylab = "prob", log = "x", las=1, bty="l") text(2^ldose, numdead/20, as.character(sex), col=c("red","black")[sex]) ld <- seq(0, 5, 0.1) pM <- predict(budworm.lg, data.frame(ldose=ld, sex=factor(rep("M", length(ld)), levels=levels(sex))), type = "response",se.fit=TRUE) lines(2^ld, pM$fit ) matlines(2^ld,pM$fit+outer(pM$se.fit,ci.scale*c(-1,1)),lty=2,col=1) pF <- predict(budworm.lg, data.frame(ldose=ld, sex=factor(rep("F", length(ld)), levels=levels(sex))), type = "response",se.fit=TRUE) lines(2^ld, pF$fit,col=2) matlines(2^ld,pF$fit+outer(pF$se.fit,ci.scale*c(-1,1)),lty=2,col=2) pM_ex <- predict(budworm.lg, data.frame(ldose=ld, sex=factor(rep("M", length(ld)), levels=levels(sex))), type = "link",se.fit=TRUE) with(pM_ex, matlines(2^ld,plogis(fit+outer(se.fit,ci.scale*c(-1,1))),lty=3,col=1)) pF_ex <- predict(budworm.lg, data.frame(ldose=ld, sex=factor(rep("F", length(ld)), levels=levels(sex))), type = "link",se.fit=TRUE) with(pF_ex, matlines(2^ld,plogis(fit+outer(se.fit,ci.scale*c(-1,1))),lty=3,col=2)) legend("bottomright", c("prediction", "+/- 2 SE (response scale)", "+/- 2 SE (link scale)"), lty=1:3) On 10-09-29 10:38 AM, Chris Mcowen wrote: > Right, that makes sense, thanks > > The reason i used type= response was i wanted to convert the predicted probabilities to the response scale, as surely this is the scale at which a 95CI value is most useful for? > > I.e > >>> pp <- predict(model1,se.fit=TRUE, type = "response") > > > 1 0.68 > > Probability of point 1 having a response of 1 = 0.68 - # this is above the cutoff therefore this has a response of 1 > > Then it would be of most use to get the 95CI on this scale to see if the probability straddles the cutoff value? > > Maybe i am missing something? > > Thanks > > > On 29 Sep 2010, at 15:27, Ben Bolker wrote: > > On 10-09-29 10:04 AM, Sam wrote: >> Hi Ben and list, >> >> Sorry to be a pain! I have followed your code, and modified it - >> > > **You should not use type="response" here.** > The point is that the (symmetric) confidence intervals are computed on > the link/linear predictor > scale, and then inverse-link-transformed (in this case, along with the > fitted values) -- they then > become asymmetric. > >>> pp <- predict(model1,se.fit=TRUE, type = "response") >>>> etaframe <- >>> + with(pp,cbind(fit,lower=fit-1.96*se.fit,upper=fit+1.96*se.fit)) >>>> pframe <- plogis(etaframe) >>>> pframe >> >> My response variable is 0 or 1, the probabilities are all high above my > cut-off point of 0.445, i think this may have something to do with >> >>> you may need to multiply by the binomial N as >>> appropriate. >> >> However how do i multiply if my response is 0 or 1? >> > > if 'size=1' (i.e. Bernoulli trials) then you would be multiplying by > 1, so don't bother. > >> >> On 29 Sep 2010, at 13:44, Ben Bolker wrote: >> >> >> ## from ?glm >> counts <- c(18,17,15,20,10,20,25,13,12) >> outcome <- gl(3,1,9) >> treatment <- gl(3,3) >> d.AD <- data.frame(treatment, outcome, counts) >> glm.D93 <- glm(counts ~ outcome + treatment, family=poisson, >> data=d.AD) >> >> ## predict on 'link' or 'linear predictor' scale, with SEs >> pp <- predict(glm.D93,se.fit=TRUE) >> etaframe <- >> with(pp,cbind(fit,lower=fit-1.96*se.fit,upper=fit+1.96*se.fit)) >> pframe <- exp(etaframe) ## inverse-link >> ## picture >> pframe <- as.data.frame(cbind(obs=d.AD$counts,pframe)) >> library(plotrix) >> plot(pframe$obs,ylim=c(5,30)) >> with(pframe,plotCI(1:9,fit,li=lower,ui=upper,col=2,add=TRUE)) >> >> If you're using a binomial model you need 'plogis' rather than 'exp' >> as your >> inverse link, and you may need to multiply by the binomial N as >> appropriate. >> >> On 10-09-29 06:07 AM, Sam wrote: >>> I am looking to do the same but am a bit confused >> >>>> and apply the inverse link function for your model. >> >>> i understand up to this point and i understand what this means, >>> however i am unsure why it needs to be done and how you do it - i.e >>> i use family="binomial" is this wrong if i use this method to >>> calculate 95% CI? >> >>> Thanks >> >>> Sam On 28 Sep 2010, at 14:50, Ben Bolker wrote: >> >>> zozio32 <remy.pascal <at> gmail.com> writes: >> >>>> >>>> >>>> Hi >>>> >>>> I had to use a glm instead of my basic lm on some data due to >>>> unconstant variance. >>>> >>>> now, when I plot the model over the data, how can I easily get >>>> the 95% confidence interval that sormally coming from: >>>> >>>>> yv <- predict(modelVar,list(aveLength=xv),int="c") >>>>> matlines(xv,yv,lty=c(1,2,2)) >>>> >>>> There is no "interval" argument to pass to the predict function >>>> when using a glm, so I was wondering if I had to use an other >>>> function >>>> >> >>> You need to use predict with se=TRUE; construct the confidence >>> intervals by computing predicted values +- 1.96 times the standard >>> error returned; and apply the inverse link function for your >>> model. >> >>> If heteroscedasticity is your main problem, and not a specific >>> (known) non-normal distribution, you might consider using the gls >>> function from the nlme package with an appropriate 'weights' >>> argument. >> >>> ______________________________________________ R-help@r-project.org >>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >>> read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.