Thanks very much for this helpful reply, Thierry

Using aes(weight=trials) in stat_smooth() was part of what I was missing and solves my main
question.
However, for this data, I want to show the extrapolated prediction over a wider range than in the data. Adding xlim() doesn't help here-- the plot annotations are cut off at the lowest
value of Temperature in the data. Is there another way?

ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
    xlim(30, 81) +
    geom_point() +
    geom_smooth(method = "glm", family = binomial, aes(weight = trials))

Below is the complete (but messy) code for the plot() I would like to more or less replicate using
ggplot()

data("SpaceShuttle", package="vcd")
logit2p <- function(logit) 1/(1 + exp(-logit))

plot(nFailures/6 ~ Temperature, data = SpaceShuttle,
     xlim = c(30, 81), ylim = c(0,1),
     main = "NASA Space Shuttle O-Ring Failures",
     ylab = "Estimated failure probability",
     xlab = "Temperature (degrees F)",
     type="n")  # painters model: add points last

fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature,
          data = SpaceShuttle,
          family = binomial)
pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)

predicted <- data.frame(
    Temperature = 30 : 81,
    prob = logit2p(pred$fit),
    lower = logit2p(pred$fit - 2*pred$se),
    upper = logit2p(pred$fit + 2*pred$se)
    )

with(predicted, {
    polygon(c(Temperature, rev(Temperature)),
            c(lower, rev(upper)), col="lightpink", border=NA)
    lines(Temperature, prob, lwd=3)
    }
    )
with(SpaceShuttle,
    points(Temperature, nFailures/6, col="blue", pch=19, cex=1.3)
    )

I also tried following the example in given in ?geom_smooth, to supply the predicted values over my range of x values, and lower/upper limits in a separate data frame:

pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)
predicted <- data.frame(
    Temperature = 30 : 81,
    prob = logit2p(pred$fit),
    lower = logit2p(pred$fit - 2*pred$se),
    upper = logit2p(pred$fit + 2*pred$se)
    )
ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
geom_smooth(aes(ymin = lower, ymax = upper), data=predicted, stat="identity")

but this gives an error:

> ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
+ geom_smooth(aes(ymin = lower, ymax = upper), data=predicted, stat="identity")
Error in eval(expr, envir, enclos) : object 'nFailures' not found
>

-Michael


On 12/17/2013 11:26 AM, ONKELINX, Thierry wrote:
Dear Michael,

Calculate the propotions. Then it is easy to use the weight option of glm

data("SpaceShuttle", package="vcd")
SpaceShuttle$trials <- 6

fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature, data = SpaceShuttle, 
family = binomial)
fm2 <- glm(nFailures/trials ~ Temperature, data = SpaceShuttle, family = 
binomial, weight = trials)
all.equal(coef(fm), coef(fm2))

ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) + geom_point() + 
geom_smooth(method = "glm", family = binomial, aes(weight = trials))

Best regards,

Thierry

-----Oorspronkelijk bericht-----
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Michael Friendly
Verzonden: dinsdag 17 december 2013 14:58
Aan: R-help
Onderwerp: [R] ggplot2: stat_smooth for family=binomial with cbind(Y, N) formula

With ggplot2, I can plot the glm stat_smooth for binomial data when the 
response is binary or a two-level factor as follows:

data("Donner", package="vcdExtra")
ggplot(Donner, aes(age, survived)) +
geom_point(position = position_jitter(height = 0.02, width = 0)) + stat_smooth(method = 
"glm", family = binomial, formula = y ~ x, alpha = 0.2, size=2)

But how can I specify the formula for stat_smooth when the response is 
cbind(successes, failures)?
The equivalent with plot (minus the confidence band) for the example I want is:

data("SpaceShuttle", package="vcd")

  > head(SpaceShuttle, 5)
    FlightNumber Temperature Pressure Fail nFailures Damage
1            1          66       50   no         0      0
2            2          70       50  yes         1      4
3            3          69       50   no         0      0
4            4          80       50 <NA>        NA     NA
5            5          68       50   no         0      0
  >

plot(nFailures/6 ~ Temperature, data = SpaceShuttle,
       xlim = c(30, 81), ylim = c(0,1),
       main = "NASA Space Shuttle O-Ring Failures",
       ylab = "Estimated failure probability",
       xlab = "Temperature (degrees F)",
       pch = 19, col = "blue", cex=1.2)
fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature,
            data = SpaceShuttle,
            family = binomial)
pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)
lines(30 : 81,
        predict(fm, data.frame(Temperature = 30 : 81), type = "response"),
        lwd = 3)

--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________


--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to