Sorry for flooding this forum, but I think I've realised that I need to do multinomial logistic regression for my problem... Would be interested in your opinion as to whether this is actually any better than running three binomial logistic regressions separately..
Thanks M On Wed, Jun 25, 2008 at 1:17 AM, <[EMAIL PROTECTED]> wrote: > It looks like A*B*C*D is a complete, totally saturated model, (the > residual deviance is effectively zero, and the residual degrees of > freedom is exactly zero - this is a clue). So when you try to put even > more parameters into the model and even higher way interactions, > something has to give. > > I find 3-factor interactions are about as much as I can think about > without getting a bit giddy. Do you really need 4- and 5-factor > interactions? If so, your only option is to get more data. > > > Bill Venables > CSIRO Laboratories > PO Box 120, Cleveland, 4163 > AUSTRALIA > Office Phone (email preferred): +61 7 3826 7251 > Fax (if absolutely necessary): +61 7 3826 7304 > Mobile: +61 4 8819 4402 > Home Phone: +61 7 3286 7700 > mailto:[EMAIL PROTECTED] > http://www.cmis.csiro.au/bill.venables/ > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Mikhail Spivakov > Sent: Wednesday, 25 June 2008 9:31 AM > To: r-help@r-project.org > Subject: [R] logistic regression > > > Hi everyone, > > I'm sorry if this turns out to be more a statistical question than one > specifically about R - but would greatly appreciate your advice anyway. > > I've been using a logistic regression model to look at the relationship > between a binary outcome (say, the odds of picking n white balls from a > bag > containing m balls in total) and a variety of other binary parameters: > > _________________________________________________________________ > > > a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > > family=binomial(link="logit")) > > summary(a.fit) > > glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family = > binomial(link = "logit"), data = a) > > Deviance Residuals: > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.69751 0.02697 -25.861 <2.00E-16 *** > A -0.02911 0.05451 -0.534 0.593335 > B 0.39842 0.06871 5.798 6.70E-09 *** > C 0.829 0.06745 12.29 <2.00E-16 *** > D 0.05928 0.11133 0.532 0.594401 > A:B -0.44053 0.13807 -3.191 0.001419 ** > A:C -0.49596 0.13664 -3.63 0.000284 *** > B:C -0.62194 0.14164 -4.391 1.13E-05 *** > A:D -0.4031 0.2279 -1.769 0.076938 . > B:D -0.60238 0.25978 -2.319 0.020407 * > C:D -0.58467 0.27195 -2.15 0.031558 * > A:B:C 0.5006 0.27364 1.829 0.067335 . > A:B:D 0.51868 0.4683 1.108 0.268049 > A:C:D 0.32882 0.51226 0.642 0.520943 > B:C:D 0.56301 0.49903 1.128 0.259231 > A:B:C:D -0.32115 0.87969 -0.365 0.715059 > > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 2.2185e+02 on 15 degrees of freedom > Residual deviance: 1.0385e-12 on 0 degrees of freedom > AIC: 124.50 > > Number of Fisher Scoring iterations: 3 > > _________________________________________________________________ > > This seems to produce sensible results given the actual data. > However, there are actually three types of balls in the experiment and I > need to model the relationship between the odds of picking each of the > type > and the parameters A,B,C,D. So what I do now is split the initial data > table > and just run glm three times: > > >all > > [fictional data] > > TYPE WHITE ALL A B C D > a 100 400 1 0 0 0 > b 200 600 1 0 0 0 > c 10 300 1 0 0 0 > .... > a 30 100 1 1 1 1 > b 50 200 1 1 1 1 > c 20 120 1 1 1 1 > > > a<-all[all$type=="a",] > > b<-all[all$type=="b",] > > c<-all[all$type=="c",] > > a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > > family=binomial(link="logit")) > > b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > > family=binomial(link="logit")) > > c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > > family=binomial(link="logit")) > > But it seems to me that I should be able to incorporate TYPE into the > model. > > Something like: > > >summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W > HITE,ALL-WHITE)~TYPE*A*B*C*D)) > > [please see the output below] > > However, when I do this, it does not seem to give an expected result. > Is this not the right way to do it? > Or this is actually less powerful than running the three models > separately? > > Will greatly appreciate your advice! > > Many thanks > Mikhail > > ----- > > Estimate Std. Error z value Pr(>|z|) > (Intercept) -8.90E-01 1.91E-02 -46.553 <2.00E-16 > *** > TYPE1 1.93E-01 2.47E-02 7.822 5.18E-15 *** > TYPE2 1.19E+00 2.42E-02 49.108 <2.00E-16 *** > A 1.89E-01 3.34E-02 5.665 1.47E-08 *** > B 1.60E-01 4.41E-02 3.627 0.000286 *** > C 2.24E-02 4.91E-02 0.455 0.64906 > D 1.96E-01 6.58E-02 2.982 0.002868 ** > TYPE1:A -2.19E-01 4.59E-02 -4.759 1.95E-06 *** > TYPE2:A -9.08E-01 4.50E-02 -20.178 <2.00E-16 *** > TYPE1:C 2.39E-01 5.93E-02 4.022 5.77E-05 *** > TYPE2:B -1.82E+00 6.46E-02 -28.178 <2.00E-16 *** > A:B -2.26E-01 8.52E-02 -2.649 0.008066 ** > TYPE1:C 8.07E-01 6.27E-02 12.87 <2.00E-16 *** > TYPE2:C -2.51E+00 7.83E-02 -32.039 <2.00E-16 *** > A:C -1.70E-01 9.51E-02 -1.783 0.074512 . > B:C -3.01E-01 1.12E-01 -2.698 0.006985 ** > TYPE1:D -1.37E-01 9.20E-02 -1.489 0.136548 > TYPE2:D -1.13E+00 9.19E-02 -12.329 <2.00E-16 *** > A:D -2.11E-01 1.27E-01 -1.655 0.097953 . > B:D -2.15E-01 1.55E-01 -1.387 0.165472 > C:D -5.51E-01 2.76E-01 -1.997 0.045829 * > TYPE1:A:B -2.15E-01 1.17E-01 -1.84 0.065714 > . > > > TYPE2:A:B 7.21E-01 1.28E-01 5.635 1.75E-08 > *** > TYPE1:A:C -3.26E-01 1.24E-01 -2.643 0.008221 > ** > TYPE2:A:C 9.70E-01 1.53E-01 6.36 2.02E-10 > *** > TYPE1:B:C -3.21E-01 1.38E-01 -2.321 0.020313 > * > TYPE2:B:C 1.35E+00 1.89E-01 7.133 9.85E-13 > *** > A:B:C 1.80E-01 2.11E-01 0.852 0.394425 > TYPE1:A:D -1.92E-01 1.83E-01 -1.05 0.293758 > TYPE2:A:D 6.76E-01 1.80E-01 3.75 0.000177 > *** > TYPE1:B:D -3.87E-01 2.16E-01 -1.796 0.072443 > . > TYPE2:B:D 1.09E+00 2.30E-01 4.709 2.49E-06 > *** > A:B:D 1.92E-01 2.73E-01 0.702 0.482512 > TYPE1:C:D -3.33E-02 3.18E-01 -0.105 0.916465 > TYPE2:C:D 1.20E-01 5.05E-01 0.238 0.811914 > A:C:D -7.37E+00 1.74E+04 -4.23E-04 0.999663 > B:C:D 3.14E-01 4.92E-01 0.638 0.523254 > TYPE1:A:B:C 3.21E-01 2.64E-01 1.218 0.223336 > TYPE2:A:B:C -8.43E-01 3.59E-01 -2.351 0.018747 > * > TYPE1:A:B:D 3.27E-01 3.84E-01 0.85 0.3952 > TYPE2:A:B:D -6.59E-01 4.08E-01 -1.617 0.105883 > TYPE1:A:C:D 7.69E+00 1.74E+04 4.42E-04 0.999648 > > TYPE2:A:C:D -1.60E+01 3.48E+04 -4.58E-04 0.999634 > > TYPE1:B:C:D 2.49E-01 5.70E-01 0.437 0.662288 > TYPE2:B:C:D -7.08E-01 8.97E-01 -0.789 0.430007 > A:B:C:D 9.08E-03 2.47E+04 3.67E-07 1 > TYPE1:A:B:C:D -3.30E-01 2.47E+04 -1.34E-05 0.999989 > TYPE2:A:B:C:D 1.10E+00 4.94E+04 2.22E-05 0.999982 > -- > View this message in context: > http://www.nabble.com/logistic-regression-tp18102137p18102137.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.