Re: [R] Pointwise Confidence Bounds on Logistic Regression

Prof Brian Ripley Thu, 19 Jun 2008 23:53:23 -0700

On Thu, 19 Jun 2008, Bryan Hanson wrote:

Thanks so much to all who offered assistance.  I have to say it would have
taken me a long time to figure this out, so I am most grateful.  Plus,
studying your examples greatly improves my understanding.


As a follow up, the fit process gives the following error:

Warning messages:
1: In eval(expr, envir, enclos) :
 non-integer #successes in a binomial glm!

Is this something I should worry about?  It doesn't arise from glm or
glm.fit

It does arise from glm(), which runs the initialization code for thefamily. Specifically

binomial()$initialize

expression({
    if (NCOL(y) == 1) {
        if (is.factor(y))
            y <- y != levels(y)[1]
        n <- rep.int(1, nobs)
        if (any(y < 0 | y > 1))
            stop("y values must be 0 <= y <= 1")
        mustart <- (weights * y + 0.5)/(weights + 1)
        m <- weights * y
        if (any(abs(m - round(m)) > 0.001))
            warning("non-integer #successes in a binomial glm!")
    }
    else if (NCOL(y) == 2) {
        if (any(abs(y - round(y)) > 0.001))
            warning("non-integer counts in a binomial glm!")
        n <- y[, 1] + y[, 2]
        y <- ifelse(n == 0, 0, y[, 1]/n)
        weights <- weights * n
        mustart <- (n * y + 0.5)/(n + 1)
    }
    else stop("for the binomial family, y must be a vector of 0 and 1's\n",

"or a 2 column matrix where col 1 is no. successes and col 2 isno. failures")

})

You specified a binomial family with responses 0.1 and 0.9. That makes nosense -- the binomial distribution is for integers. Perhaps you mean 1/10and 9/10, in which case you needed a weight of 10, but I'm forced toguess.


Thanks, Bryan

For the record, a final complete working code is appended below.

x <- c(1:40)
y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
data <- as.data.frame(cbind(x,y1,y2))
plot(x, y1, pch = 1, ylim = c(0,1), col = "red", main = "Logistic Regression
w/Confidence Bounds", ylab = "y values",  xlab = "x values")
points(x, y2, pch = 3, col = "blue")
abline(h = 0.5, col = "gray")
fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
na.omit)
fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
na.omit)
lines(fit1$model$x, fit1$fitted.values, col = "red")
lines(fit2$model$x, fit2$fitted.values, col = "blue")

## predictions on scale of link function
pred1 <- predict(fit1, se.fit = TRUE)
pred2 <- predict(fit2, se.fit = TRUE)

## constant for 95% confidence bands
## getting two t values is redundant here as fit1 and fit2
## have same residual df, but the real world may be different
res.df <- c(fit1$df.residual, fit2$df.residual)
## 0.975 because we want 2.5% in upper and lower tail
const <- qt(0.975, df = res.df)

## confidence bands on scale of link function
upper1 <- pred1$fit + (const[1] * pred1$se.fit)
lower1 <- pred1$fit - (const[1] * pred1$se.fit)
upper2 <- pred2$fit + (const[2] * pred2$se.fit)
lower2 <- pred2$fit - (const[2] * pred2$se.fit)

## bind together into a data frame
bands <- data.frame(upper1, lower1, upper2, lower2)

## transform on to scale of response
bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))

## plot confidence bands
lines(fit1$model$x, bands$upper1, col = "pink")
lines(fit1$model$x, bands$lower1, col = "pink")
lines(fit2$model$x, bands$upper2, col = "lightblue")
lines(fit2$model$x, bands$lower2, col = "lightblue")



On 6/19/08 12:28 PM, "Gavin Simpson" <[EMAIL PROTECTED]> wrote:

On Thu, 2008-06-19 at 10:42 -0400, Bryan Hanson wrote:

[I've ommitted some of the conversation so far...]

E.g. in a logistic model, with (say) eta = beta_0 + beta_1*x one may
find, on the
linear predictor scale, A and B (say) such that P(A <= eta <= B) = 0.95.

Then P(expit(A) <= expit(eta) <= expit(B)) = 0.95, which is exactly
what is wanted.


I think I follow the above conceptually, but I don't know how to implement
it, though I fooled around (unsuccessfully) with some of the variations on
predict().

I'm trying to learn this in response to a biology colleague who did
something similar in SigmaPlot. I can already tell that SigmaPlot did a lot
of stuff for him in the background.  The responses are 0/1 of a particular
observation by date.  The following code simulates what's going on (note
that I didn't use 0/1 since this leads to a recognized condition/warning of
fitting 1's and 0's. I've requested Brian's Pattern Recognition book so I
know what the problem is and how to solve it).  My colleague is looking at
two populations in which the "LD50" would differ.  I'd like to be able to
put the "pointwise confidence bounds" on each curve so he can tell if the
populations are really different.

By the way, this code does give a (minor?) error from glm (which you will
see).

Can you make a suggestion about how to get those "confidence bounds" on
there?

Also, is a probit link more appropriate here?

Thanks,  Bryan

x <- c(1:40)
y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
data <- as.data.frame(cbind(x,y1,y2))
plot(x, y1, pch = 1, ylim = c(0,1), col = "red")
points(x, y2, pch = 3, col = "blue")
abline(h = 0.5, col = "gray")
fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
na.omit)
fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
na.omit)
lines(fit1$model$x, fit1$fitted.values, col = "red")
lines(fit2$model$x, fit2$fitted.values, col = "blue")


The point is to get predictions on the scale of the link function,
generate 95% confidence bands in the normal way and then transform the
confidence bands onto the scale of the response using the inverse of the
link function used to fit the model.

[note, am doing this from memory, so best to check this is right -- I'm
sure someone will tell me very quickly if I have gone wrong anywhere!]

## predictions on scale of link function
pred1 <- predict(fit1, se.fit = TRUE)
pred2 <- predict(fit2, se.fit = TRUE)

## constant for 95% confidence bands
## getting two t values is redundant here as fit1 and fit2
## have same residual df, but the real world may be different
res.df <- c(fit1$df.residual, fit2$df.residual)
## 0.975 because we want 2.5% in upper and lower tail
const <- qt(0.975, df = res.df)

## confidence bands on scale of link function
upper1 <- pred1$fit + (const[1] * pred1$se.fit)
lower1 <- pred1$fit - (const[1] * pred1$se.fit)
upper2 <- pred2$fit + (const[2] * pred2$se.fit)
lower2 <- pred2$fit - (const[2] * pred2$se.fit)

## bind together into a data frame
bands <- data.frame(upper1, lower1, upper2, lower2)

## transform on to scale of response
bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))

## plot confidence bands
lines(fit1$model$x, bands$upper1, col = "red", lty = "dotted")
lines(fit1$model$x, bands$lower1, col = "red", lty = "dotted")
lines(fit2$model$x, bands$upper2, col = "blue", lty = "dotted")
lines(fit2$model$x, bands$lower2, col = "blue", lty = "dotted")

HTH

G

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pointwise Confidence Bounds on Logistic Regression

Reply via email to