Dear Thomas, Thank you very much for the answering!
Yet why the situation happens only on some model, not all models? - that is, why for other model it can drop some variables but for this one it can't? Thanks!! Best regards, Maggie On Wed, Mar 18, 2009 at 3:38 PM, Thomas Lumley <tlum...@u.washington.edu> wrote: > > With 30 variables and only 55 residual degrees of freedom you probably have > perfect separation due to not having enough data. Look at the coefficients > -- they are infinite, implying perfect overfitting. > > -thomas > > On Wed, 18 Mar 2009, Maggie Wang wrote: > >> Dear R-users, >> >> I use glm() to do logistic regression and use stepAIC() to do stepwise >> model >> selection. >> >> The common AIC value comes out is about 100, a good fit is as low as >> around >> 70. But for some model, the AIC went to extreme values like 1000. When I >> check the P-values, All the independent variables (about 30 of them) >> included in the equation are very significant, which is impossible, >> because >> we expect some would be dropped. This situation is not uncommon. >> >> A summary output like this: >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) 4.883e+14 1.671e+07 29217415 <2e-16 *** >> g761 -5.383e+14 9.897e+07 -5438529 <2e-16 *** >> g2809 -1.945e+15 1.082e+08 -17977871 <2e-16 *** >> g3106 -2.803e+15 9.351e+07 -29976674 <2e-16 *** >> g4373 -9.272e+14 6.534e+07 -14190077 <2e-16 *** >> g4583 -2.279e+15 1.223e+08 -18640563 <2e-16 *** >> g761:g2809 -5.101e+14 4.693e+08 -1086931 <2e-16 *** >> g761:g3106 -3.399e+16 6.923e+08 -49093218 <2e-16 *** >> g2809:g3106 3.016e+15 6.860e+08 4397188 <2e-16 *** >> g761:g4373 3.180e+15 4.595e+08 6920270 <2e-16 *** >> g2809:g4373 -5.184e+15 4.436e+08 -11685382 <2e-16 *** >> g3106:g4373 1.589e+16 2.572e+08 61788148 <2e-16 *** >> g761:g4583 -1.419e+16 8.199e+08 -17303033 <2e-16 *** >> g2809:g4583 -2.540e+16 8.151e+08 -31156781 <2e-16 *** >> ........ >> (omit) >> ........ >> >> f. codes: 0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1 >> >> (Dispersion parameter for binomial family taken to be 1) >> >> Null deviance: 120.32 on 86 degrees of freedom >> Residual deviance: 1009.22 on 55 degrees of freedom >> AIC: 1073.2 >> >> Number of Fisher Scoring iterations: 25 >> >> Could anyone suggest what does this mean? How can I perform a reliable >> logistic regression? >> >> Thank you so much for the help! >> >> Best Regards, >> Maggie >> >> [[alternative HTML version deleted]] >> >> > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.