Re: [R] Extreme AIC or BIC values in glm(), logistic regression

Thomas Lumley Thu, 19 Mar 2009 01:00:15 -0700

On Thu, 19 Mar 2009, Maggie Wang wrote:

Dear Thomas,


Thank you very much for the answering!

Yet why the situation happens only on some model, not all models? -
that is, why for other model it can drop some variables but for this
one it can't?


Presumably the other models don't have perfect separation.  If you don't have 
enough data for reliable estimation you will get many models that predict 
poorly and a few that predict extremely well, just by chance.

     -thomas

Thanks!!

Best regards,
Maggie



On Wed, Mar 18, 2009 at 3:38 PM, Thomas Lumley <tlum...@u.washington.edu> wrote:


With 30 variables and only 55 residual degrees of freedom you probably have
perfect separation due to not having enough data.  Look at the coefficients
-- they are infinite, implying perfect overfitting.

     -thomas

On Wed, 18 Mar 2009, Maggie Wang wrote:

Dear R-users,

I use glm() to do logistic regression and use stepAIC() to do stepwise
model
selection.

The common AIC value comes out is about 100, a good fit is as low as
around
70. But for some model, the AIC went to extreme values like 1000. When I
check the P-values, All the independent variables (about 30 of them)
included in the equation are very significant, which is impossible,
because
we expect some would be dropped.  This situation is not uncommon.

A summary output like this:

Coefficients:
                            Estimate Std. Error   z value Pr(>|z|)
(Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16 ***
g761                         -5.383e+14  9.897e+07  -5438529   <2e-16 ***
g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16 ***
g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16 ***
g4373                        -9.272e+14  6.534e+07 -14190077   <2e-16 ***
g4583                        -2.279e+15  1.223e+08 -18640563   <2e-16 ***
g761:g2809                   -5.101e+14  4.693e+08  -1086931   <2e-16 ***
g761:g3106                   -3.399e+16  6.923e+08 -49093218   <2e-16 ***
g2809:g3106                   3.016e+15  6.860e+08   4397188   <2e-16 ***
g761:g4373                    3.180e+15  4.595e+08   6920270   <2e-16 ***
g2809:g4373                  -5.184e+15  4.436e+08 -11685382   <2e-16 ***
g3106:g4373                   1.589e+16  2.572e+08  61788148   <2e-16 ***
g761:g4583                   -1.419e+16  8.199e+08 -17303033   <2e-16 ***
g2809:g4583                  -2.540e+16  8.151e+08 -31156781   <2e-16 ***
........
(omit)
........

f. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

(Dispersion parameter for binomial family taken to be 1)

 Null deviance:  120.32  on 86  degrees of freedom
Residual deviance: 1009.22  on 55  degrees of freedom
AIC: 1073.2

Number of Fisher Scoring iterations: 25

Could anyone suggest what does this mean?   How can I perform a reliable
logistic regression?

Thank you so much for the help!

Best Regards,
Maggie

       [[alternative HTML version deleted]]


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extreme AIC or BIC values in glm(), logistic regression

Reply via email to