On Dec 30, 2010, at 22:01 , Lensing, Shelly Y wrote: (Once will do...)
> Hi - I am fitting a probit model using glm(), and the deviance and residual > degrees of freedom are different depending on whether I use a binary response > vector of length 80 or a two-column matrix response (10 rows) with the number > of success and failures in each column. I would think that these would be > just two different ways of specifying the same model, but this does not > appear to be the case. > > Binary response vector gives: > Residual deviance: 43.209 on 77 degrees of freedom > > Two-column matrix response gives: > Residual deviance: 4.9204 on 7 degrees of freedom > > I'd like to understand why the two-column response format gives a residual > degrees of freedom of 7, and why the weights for one is nearly, but not > exactly, a multiple of the other. I need the deviance, df, and weights for > another formula, which is why I'm focused on these. My code is below. Thank > you in advance for any assistance! Shelly > Easy: When given in the binary form, glm() has no way to know that your data comes from 10 homogenous groups (consider extending the model with a covariate like z <- runif(80) ). So the "saturated model" is simply bigger. To recover the 7 DF deviance, interject a model that represents the the 10 groups (i.e., unit <- gl(10,8); fu <- glm(y~unit,.....); anova(fu, f180), provided that I follow your code correctly). > **** > > # 10 record set-up > group <- gl(2, 5, 10, labels=c("U","M")) > dose <- rep(c(7, 8, 9, 10, 11), 2) > ldose <- log10(dose) > n <- c(8,8,8,8,8,8,8,8,8,8) > r <- c(0,1,3,8,8,0,0,0,4,5) > p <- r/n > d <- data.frame(group, dose, ldose, n, r, p) > SF <- cbind(success=d$r, failure=d$n - d$r) > > #80 record set-up > dose2<-c(7,8,9,10,11) > doserep<-sort(rep(dose2,8)) > x<-c(doserep,doserep) > log10x<-log10(x) > y_U<-c(rep(0,8), 1, rep(0, 7), 1, 1, 1, rep(0,5), rep(1, 16)) > y_M<-c(rep(0,24), rep(1,4), rep(0,4), rep(1,5), rep(0,3)) > y<-c(y_U, y_M) > trt<-c(rep(1, 40), rep(0, 40)) > > # print x & y's for both > SF > y > ldose > log10x > > # analysis with 10 records and 80 records > f1 <- glm(SF ~ group + ldose, family=binomial(link="probit")) > f3 <- glm(SF ~ ldose, family=binomial(link="probit")) > f180 <- glm(y ~ trt + log10x, family=binomial(link="probit")) > f380 <- glm(y ~ log10x, family=binomial(link="probit")) > > summary(f1) > summary(f180) > > f1$weights > f180$weights > # check weights divided by 8 to see if match -- match several decimal places, > # but not exactly > f1$weights/8 > > **** > > Shelly Lensing > Biostatistics / University of Arkansas for Medical Sciences > > Confidentiality Notice: This e-mail message, including a...{{dropped:7}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.