Just a small correction to my previous reply. I meant to say probabilities of success for each type add up to one.
thanks Frank for noticing the mistake. Bill.Venables wrote: > > It looks like A*B*C*D is a complete, totally saturated model, (the > residual deviance is effectively zero, and the residual degrees of > freedom is exactly zero - this is a clue). So when you try to put even > more parameters into the model and even higher way interactions, > something has to give. > > I find 3-factor interactions are about as much as I can think about > without getting a bit giddy. Do you really need 4- and 5-factor > interactions? If so, your only option is to get more data. > > > Bill Venables > CSIRO Laboratories > PO Box 120, Cleveland, 4163 > AUSTRALIA > Office Phone (email preferred): +61 7 3826 7251 > Fax (if absolutely necessary): +61 7 3826 7304 > Mobile: +61 4 8819 4402 > Home Phone: +61 7 3286 7700 > mailto:[EMAIL PROTECTED] > http://www.cmis.csiro.au/bill.venables/ > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Mikhail Spivakov > Sent: Wednesday, 25 June 2008 9:31 AM > To: r-help@r-project.org > Subject: [R] logistic regression > > > Hi everyone, > > I'm sorry if this turns out to be more a statistical question than one > specifically about R - but would greatly appreciate your advice anyway. > > I've been using a logistic regression model to look at the relationship > between a binary outcome (say, the odds of picking n white balls from a > bag > containing m balls in total) and a variety of other binary parameters: > > _________________________________________________________________ > >> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >> family=binomial(link="logit")) >> summary(a.fit) > > glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family = > binomial(link = "logit"), data = a) > > Deviance Residuals: > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.69751 0.02697 -25.861 <2.00E-16 *** > A -0.02911 0.05451 -0.534 0.593335 > B 0.39842 0.06871 5.798 6.70E-09 *** > C 0.829 0.06745 12.29 <2.00E-16 *** > D 0.05928 0.11133 0.532 0.594401 > A:B -0.44053 0.13807 -3.191 0.001419 ** > A:C -0.49596 0.13664 -3.63 0.000284 *** > B:C -0.62194 0.14164 -4.391 1.13E-05 *** > A:D -0.4031 0.2279 -1.769 0.076938 . > B:D -0.60238 0.25978 -2.319 0.020407 * > C:D -0.58467 0.27195 -2.15 0.031558 * > A:B:C 0.5006 0.27364 1.829 0.067335 . > A:B:D 0.51868 0.4683 1.108 0.268049 > A:C:D 0.32882 0.51226 0.642 0.520943 > B:C:D 0.56301 0.49903 1.128 0.259231 > A:B:C:D -0.32115 0.87969 -0.365 0.715059 > > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 2.2185e+02 on 15 degrees of freedom > Residual deviance: 1.0385e-12 on 0 degrees of freedom > AIC: 124.50 > > Number of Fisher Scoring iterations: 3 > > _________________________________________________________________ > > This seems to produce sensible results given the actual data. > However, there are actually three types of balls in the experiment and I > need to model the relationship between the odds of picking each of the > type > and the parameters A,B,C,D. So what I do now is split the initial data > table > and just run glm three times: > >>all > > [fictional data] > > TYPE WHITE ALL A B C D > a 100 400 1 0 0 0 > b 200 600 1 0 0 0 > c 10 300 1 0 0 0 > .... > a 30 100 1 1 1 1 > b 50 200 1 1 1 1 > c 20 120 1 1 1 1 > >> a<-all[all$type=="a",] >> b<-all[all$type=="b",] >> c<-all[all$type=="c",] >> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >> family=binomial(link="logit")) >> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >> family=binomial(link="logit")) >> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >> family=binomial(link="logit")) > > But it seems to me that I should be able to incorporate TYPE into the > model. > > Something like: > >>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W > HITE,ALL-WHITE)~TYPE*A*B*C*D)) > > [please see the output below] > > However, when I do this, it does not seem to give an expected result. > Is this not the right way to do it? > Or this is actually less powerful than running the three models > separately? > > Will greatly appreciate your advice! > > Many thanks > Mikhail > > ----- > > Estimate Std. Error z value Pr(>|z|) > (Intercept) -8.90E-01 1.91E-02 -46.553 <2.00E-16 > *** > TYPE1 1.93E-01 2.47E-02 7.822 5.18E-15 *** > TYPE2 1.19E+00 2.42E-02 49.108 <2.00E-16 *** > A 1.89E-01 3.34E-02 5.665 1.47E-08 *** > B 1.60E-01 4.41E-02 3.627 0.000286 *** > C 2.24E-02 4.91E-02 0.455 0.64906 > D 1.96E-01 6.58E-02 2.982 0.002868 ** > TYPE1:A -2.19E-01 4.59E-02 -4.759 1.95E-06 *** > TYPE2:A -9.08E-01 4.50E-02 -20.178 <2.00E-16 *** > TYPE1:C 2.39E-01 5.93E-02 4.022 5.77E-05 *** > TYPE2:B -1.82E+00 6.46E-02 -28.178 <2.00E-16 *** > A:B -2.26E-01 8.52E-02 -2.649 0.008066 ** > TYPE1:C 8.07E-01 6.27E-02 12.87 <2.00E-16 *** > TYPE2:C -2.51E+00 7.83E-02 -32.039 <2.00E-16 *** > A:C -1.70E-01 9.51E-02 -1.783 0.074512 . > B:C -3.01E-01 1.12E-01 -2.698 0.006985 ** > TYPE1:D -1.37E-01 9.20E-02 -1.489 0.136548 > TYPE2:D -1.13E+00 9.19E-02 -12.329 <2.00E-16 *** > A:D -2.11E-01 1.27E-01 -1.655 0.097953 . > B:D -2.15E-01 1.55E-01 -1.387 0.165472 > C:D -5.51E-01 2.76E-01 -1.997 0.045829 * > TYPE1:A:B -2.15E-01 1.17E-01 -1.84 0.065714 > . > > > TYPE2:A:B 7.21E-01 1.28E-01 5.635 1.75E-08 > *** > TYPE1:A:C -3.26E-01 1.24E-01 -2.643 0.008221 > ** > TYPE2:A:C 9.70E-01 1.53E-01 6.36 2.02E-10 > *** > TYPE1:B:C -3.21E-01 1.38E-01 -2.321 0.020313 > * > TYPE2:B:C 1.35E+00 1.89E-01 7.133 9.85E-13 > *** > A:B:C 1.80E-01 2.11E-01 0.852 0.394425 > TYPE1:A:D -1.92E-01 1.83E-01 -1.05 0.293758 > TYPE2:A:D 6.76E-01 1.80E-01 3.75 0.000177 > *** > TYPE1:B:D -3.87E-01 2.16E-01 -1.796 0.072443 > . > TYPE2:B:D 1.09E+00 2.30E-01 4.709 2.49E-06 > *** > A:B:D 1.92E-01 2.73E-01 0.702 0.482512 > TYPE1:C:D -3.33E-02 3.18E-01 -0.105 0.916465 > TYPE2:C:D 1.20E-01 5.05E-01 0.238 0.811914 > A:C:D -7.37E+00 1.74E+04 -4.23E-04 0.999663 > B:C:D 3.14E-01 4.92E-01 0.638 0.523254 > TYPE1:A:B:C 3.21E-01 2.64E-01 1.218 0.223336 > TYPE2:A:B:C -8.43E-01 3.59E-01 -2.351 0.018747 > * > TYPE1:A:B:D 3.27E-01 3.84E-01 0.85 0.3952 > TYPE2:A:B:D -6.59E-01 4.08E-01 -1.617 0.105883 > TYPE1:A:C:D 7.69E+00 1.74E+04 4.42E-04 0.999648 > > TYPE2:A:C:D -1.60E+01 3.48E+04 -4.58E-04 0.999634 > > TYPE1:B:C:D 2.49E-01 5.70E-01 0.437 0.662288 > TYPE2:B:C:D -7.08E-01 8.97E-01 -0.789 0.430007 > A:B:C:D 9.08E-03 2.47E+04 3.67E-07 1 > TYPE1:A:B:C:D -3.30E-01 2.47E+04 -1.34E-05 0.999989 > TYPE2:A:B:C:D 1.10E+00 4.94E+04 2.22E-05 0.999982 > -- > View this message in context: > http://www.nabble.com/logistic-regression-tp18102137p18102137.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/logistic-regression-tp18102137p18111648.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.