On Thu, 19 Mar 2009, Maggie Wang wrote:
Dear Thomas,
Thank you very much for the answering!
Yet why the situation happens only on some model, not all models? -
that is, why for other model it can drop some variables but for this
one it can't?
Presumably the other models don't have perfect separation. If you don't have
enough data for reliable estimation you will get many models that predict
poorly and a few that predict extremely well, just by chance.
-thomas
Thanks!!
Best regards,
Maggie
On Wed, Mar 18, 2009 at 3:38 PM, Thomas Lumley <tlum...@u.washington.edu> wrote:
With 30 variables and only 55 residual degrees of freedom you probably have
perfect separation due to not having enough data. Look at the coefficients
-- they are infinite, implying perfect overfitting.
-thomas
On Wed, 18 Mar 2009, Maggie Wang wrote:
Dear R-users,
I use glm() to do logistic regression and use stepAIC() to do stepwise
model
selection.
The common AIC value comes out is about 100, a good fit is as low as
around
70. But for some model, the AIC went to extreme values like 1000. When I
check the P-values, All the independent variables (about 30 of them)
included in the equation are very significant, which is impossible,
because
we expect some would be dropped. This situation is not uncommon.
A summary output like this:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.883e+14 1.671e+07 29217415 <2e-16 ***
g761 -5.383e+14 9.897e+07 -5438529 <2e-16 ***
g2809 -1.945e+15 1.082e+08 -17977871 <2e-16 ***
g3106 -2.803e+15 9.351e+07 -29976674 <2e-16 ***
g4373 -9.272e+14 6.534e+07 -14190077 <2e-16 ***
g4583 -2.279e+15 1.223e+08 -18640563 <2e-16 ***
g761:g2809 -5.101e+14 4.693e+08 -1086931 <2e-16 ***
g761:g3106 -3.399e+16 6.923e+08 -49093218 <2e-16 ***
g2809:g3106 3.016e+15 6.860e+08 4397188 <2e-16 ***
g761:g4373 3.180e+15 4.595e+08 6920270 <2e-16 ***
g2809:g4373 -5.184e+15 4.436e+08 -11685382 <2e-16 ***
g3106:g4373 1.589e+16 2.572e+08 61788148 <2e-16 ***
g761:g4583 -1.419e+16 8.199e+08 -17303033 <2e-16 ***
g2809:g4583 -2.540e+16 8.151e+08 -31156781 <2e-16 ***
........
(omit)
........
f. codes: 0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 120.32 on 86 degrees of freedom
Residual deviance: 1009.22 on 55 degrees of freedom
AIC: 1073.2
Number of Fisher Scoring iterations: 25
Could anyone suggest what does this mean? How can I perform a reliable
logistic regression?
Thank you so much for the help!
Best Regards,
Maggie
[[alternative HTML version deleted]]
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.