Oops. In my previous email I meant to say the following: In the AIC approach, you include a new variable or delete an existing variable when the change in the "log-likelihood" value is 2 or more.
Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: Ravi Varadhan <rvarad...@jhmi.edu> Date: Saturday, June 13, 2009 1:40 pm Subject: Re: [R] Insignificant variable improves AIC (multinom)? To: Werner Wernersen <pensterfuz...@yahoo.de> Cc: r-h...@stat.math.ethz.ch > Hi Werner, > > AICs of nested models are compared on additive scale, not on > multiplicative scale. So, you have to think about how much the AIC is > decreased when you add the new variable, not the factor by which it is > reduced. > > If you are doing a stepwise selection based on AIC, then the p-value > approach and AIC approach are related. In the AIC approach, you > include a new variable or delete an existing variable when the change > in AIC score is 2 or more. In the stepwise likelihood ratio test, > LRT, (a.k.a. F-test in linear regression), to select variables, the > AIC score change of 2 corresponds roughly to a p-value of 0.15, i.e. > entering or deleting a variable if the p-value for the LRT is less > than 0.15. > > Of course, the big issue is that the sampling properties of stepwise > model selection procedures are extremely difficult to characterize. > Resampling and cross-validation approaches can help address this > problem. Another more principled approach to model selection is to use > regularization methods (e.g. ridge, lasso). But there is no free > lunch. In regularization methods, one has to decide on the degree of > regularization. > > I hope I have successfully convinced you about the perils and > pitfalls of model selection. > > Best, > Ravi. > ____________________________________________________________________ > > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology > School of Medicine > Johns Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu > > > ----- Original Message ----- > From: Werner Wernersen <pensterfuz...@yahoo.de> > Date: Saturday, June 13, 2009 10:52 am > Subject: Re: [R] Insignificant variable improves AIC (multinom)? > To: Peter Flom <peterflomconsult...@mindspring.com>, r-h...@stat.math.ethz.ch > > > > > >Hi, > > > > > > > > > >I am trying to specify a multinomial logit model using the > > multinom function > > > from the nnet package. Now I add another independent variable > and > > it halves the > > > AIC as given by summary(multinom()). But when I call > > Anova(multinom()) from the > > > car package, it tells me that this added variable is > insignificant > > > > > (Pr(>Chisq)=0.39). Thus, the improved AIC suggests to keep the > > variable but the > > > Anova suggests to drop it. > > > > > > > >I am sure this is due to my lack of understanding of these > models > > but could > > > someone help me out with a pointer what my mistake is? > > > > > > > > > I am not sure why you would expect the same answer from AIC and > > > p-value. They > > > are different questions. AIC attempts to answer a question > about > > overall model > > > fit. p-value for a particular variable attempts to answer > whether > > that > > > particular coefficient could be due to chance if the population > > > value of the > > > parameter was 0. > > > > > > One way these could give different answers is if the new > variable > > affected the > > > parameter estimates for the other parameters. > > > > > > It's yet another exemplar of the problems with using p-values > for > > model > > > selection > > > > > > HTH > > > > > > Peter > > > > > > Peter L. Flom, PhD > > > Statistical Consultant > > > www DOT peterflomconsulting DOT com > > > [[elided Yahoo spam]] > > > > That was very enlightening. I have to read up on model selection. > The > > thought I have to get my head around is that the added variable > helps > > explaining the observed variability in the data and thus should be > > > retained in the model. But since the coefficient is insignificant, > I > > cannot interpret it and if I use this equation for predictions then > I > > add a "random" value since I cannot reject that the coefficient is > > > actually zero instead of what I estimated. > > > > One just never sees someone presenting regression coefficients > which > > are not significant although model selection procedures are often > > based on the AIC... > > > > Have a good weekend, > > Werner > > > > > > > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > > > PLEASE do read the posting guide > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.