The second case also needs the argument: weight=n Then all 3 models should give the same general fit (same coefficients, same predicted values).
The differences are subtle and may not be of interest. Conceptually think about: did you run 10 trials under a set of conditions (age=x, sex=y, class=z) and 9 of them were successes? This is model 2/3. Or did you run a bunch of individual trials and just by chance 10 of them happened to have the same conditions (age=x, sex=y, class=z) and 9 of those 10 were successes? This is model 1. The biggest visible difference is in the deviance calculations. That comes about because in model 1 the saturated model can fit every point exactly (since the responses are all 0 or 1), in the other 2 the saturated model gives the same proportion for each combination of predictors as observed, but these are not 0/1 now. The most important difference comes when you decide to extend the model, (mixed effects, bootstrapping) because the observational unit is different between model 1 and models 2 & 3 (I don't know of any differences between 2 & 3 other than looks/convenience). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > project.org] On Behalf Of andyer weng > Sent: Monday, October 20, 2008 4:39 PM > To: r-help@r-project.org > Subject: Re: [R] Categorical Response Query > > Hi all, > > I have a queston about Categorical response. > > i have a data frame containing age, sex, class, success(1=success, > 0=non sucess). > age, sex,class are the explantory variables, and sucess is the > response variable. and i can get n (the nunber of times each age > occurs) and r (the number of sucess of that age). > > when I try to creat the regression relationship for these variables, I > have seen many different cases, i just wonder which one fits me the > best for this situation. > > 1st case, > xxx.glm<-glm(success~age*sex*class,family=binomial, data=xxx.data) > > 2nd case > > xxx.glm<-glm(r/n~age*sex*class,family=binomial, data=xxx.data) > > 3rd case > > xxx.glm<-glm(cbind(r,n-r)~age*sex*class,family=binomial, data=xxx.data) > > what is difference between the above 3 cases? which one is the best to > use? > > if Ii don't group the data, can I use the 1st case. if i group the > data, can i use 2nd or 3rd case? > > please advise. > > Cheers. > Andyer > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.