On Sep 29, 2012, at 7:10 AM, Josh Browning wrote: > Hi useRs, > > I'm experiencing something quite weird with glm() and weights, and > maybe someone can explain what I'm doing wrong. I have a dataset > where each row represents a single case, and I run > glm(...,family="binomial") and get my coefficients. However, some of > my cases have the exact same values for predictor variables, so I > should be able to aggregate up my data frame and run glm(..., > family="binomial",weights=wts) and get the same coefficients (maybe > this is my incorrect assumption, but I can't see why it would be). > Anyways, here's a minimum working example below: > >> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) >> glm( RESP ~ INDEP, family="binomial", data=d ) > > Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) > > Coefficients: > (Intercept) INDEP > -1.609 21.176 > > Degrees of Freedom: 9 Total (i.e. Null); 8 Residual > Null Deviance: 13.86 > Residual Deviance: 5.407 AIC: 9.407 >> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) >> colnames(dAgg) = c("RESP","INDEP","WT") >> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT ) > > Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, > weights = WT) > > Coefficients: > (Intercept) INDEP > -1.609 20.975 > > Degrees of Freedom: 2 Total (i.e. Null); 1 Residual > Null Deviance: 13.86 > Residual Deviance: 5.407 AIC: 9.407
Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical difference in the regression coefficient? Am I reading you correctly? -- David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.