I haven't followed this thread closely, but if perfect separation in a binomial glm is the problem, google it. e.g.
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/complete_separation_logit_models.htm This presumably explains your concerns about coefficient agreement. -- Bert On Sun, Sep 30, 2012 at 4:47 AM, Josh Browning <rockclimber112...@gmail.com> wrote: > Hi David, > > Yes, I agree that the results are "very similar" but I don't > understand why they are not exactly equal given that the data sets are > identical. > > And yes, this 1% numerical difference is hugely important to me. I > have another data set (much larger than this toy example) that works > on the aggregated data (returning a coefficient of about 1) but > returns the warning about perfect separation on the non-aggregated > data (and a coefficient of about 1e15). So, I'd at least like to be > able to understand where this numerical difference is coming from and, > preferably, a way to tweak my glm() runs (possibly adjusting the > numerical precision somehow???) so that this doesn't happen. > > Josh > > On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius <dwinsem...@comcast.net> > wrote: >> >> On Sep 29, 2012, at 7:10 AM, Josh Browning wrote: >> >>> Hi useRs, >>> >>> I'm experiencing something quite weird with glm() and weights, and >>> maybe someone can explain what I'm doing wrong. I have a dataset >>> where each row represents a single case, and I run >>> glm(...,family="binomial") and get my coefficients. However, some of >>> my cases have the exact same values for predictor variables, so I >>> should be able to aggregate up my data frame and run glm(..., >>> family="binomial",weights=wts) and get the same coefficients (maybe >>> this is my incorrect assumption, but I can't see why it would be). >>> Anyways, here's a minimum working example below: >>> >>>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) >>>> glm( RESP ~ INDEP, family="binomial", data=d ) >>> >>> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) >>> >>> Coefficients: >>> (Intercept) INDEP >>> -1.609 21.176 >>> >>> Degrees of Freedom: 9 Total (i.e. Null); 8 Residual >>> Null Deviance: 13.86 >>> Residual Deviance: 5.407 AIC: 9.407 >>>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) >>>> colnames(dAgg) = c("RESP","INDEP","WT") >>>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT ) >>> >>> Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, >>> weights = WT) >>> >>> Coefficients: >>> (Intercept) INDEP >>> -1.609 20.975 >>> >>> Degrees of Freedom: 2 Total (i.e. Null); 1 Residual >>> Null Deviance: 13.86 >>> Residual Deviance: 5.407 AIC: 9.407 >> >> Those two results look very similar and it is with a data situation that >> seems somewhat extreme. The concern is for the 1% numerical difference in >> the regression coefficient? Am I reading you correctly? >> >> -- >> David Winsemius, MD >> Alameda, CA, USA >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.