Hi David, Yes, I agree that the results are "very similar" but I don't understand why they are not exactly equal given that the data sets are identical.
And yes, this 1% numerical difference is hugely important to me. I have another data set (much larger than this toy example) that works on the aggregated data (returning a coefficient of about 1) but returns the warning about perfect separation on the non-aggregated data (and a coefficient of about 1e15). So, I'd at least like to be able to understand where this numerical difference is coming from and, preferably, a way to tweak my glm() runs (possibly adjusting the numerical precision somehow???) so that this doesn't happen. Josh On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Sep 29, 2012, at 7:10 AM, Josh Browning wrote: > >> Hi useRs, >> >> I'm experiencing something quite weird with glm() and weights, and >> maybe someone can explain what I'm doing wrong. I have a dataset >> where each row represents a single case, and I run >> glm(...,family="binomial") and get my coefficients. However, some of >> my cases have the exact same values for predictor variables, so I >> should be able to aggregate up my data frame and run glm(..., >> family="binomial",weights=wts) and get the same coefficients (maybe >> this is my incorrect assumption, but I can't see why it would be). >> Anyways, here's a minimum working example below: >> >>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) >>> glm( RESP ~ INDEP, family="binomial", data=d ) >> >> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) >> >> Coefficients: >> (Intercept) INDEP >> -1.609 21.176 >> >> Degrees of Freedom: 9 Total (i.e. Null); 8 Residual >> Null Deviance: 13.86 >> Residual Deviance: 5.407 AIC: 9.407 >>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) >>> colnames(dAgg) = c("RESP","INDEP","WT") >>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT ) >> >> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, >> weights = WT) >> >> Coefficients: >> (Intercept) INDEP >> -1.609 20.975 >> >> Degrees of Freedom: 2 Total (i.e. Null); 1 Residual >> Null Deviance: 13.86 >> Residual Deviance: 5.407 AIC: 9.407 > > Those two results look very similar and it is with a data situation that > seems somewhat extreme. The concern is for the 1% numerical difference in > the regression coefficient? Am I reading you correctly? > > -- > David Winsemius, MD > Alameda, CA, USA > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.