To pretend that AIC solves this problem is to ignore that AIC is just a restatement of P-values. Frank
Rubén Roa wrote > > I think we have gone through this before. > First, the destruction of all aspects of statistical inference is not at > stake, Frank Harrell's post notwithstanding. > Second, checking all pairs is a way to see for _all pairs_ which model > outcompetes which in terms of predictive ability by -2AIC or more. Just > sorting them by the AIC does not give you that if no model is better than > the next best by less than 2AIC. > Third, I was not implying that AIC differences play the role of > significance tests. I agree with you that model selection is better not > understood as a proxy or as a relative of significance tests procedures. > Incidentally, when comparing many models the AIC is often inconclusive. If > one is bent on selecting just _the model_ then I check numerical > optimization diagnostics such as size of gradients, KKT criteria, and > other issues such as standard errors of parameter estimates and the > correlation matrix of parameter estimates. For some reasons I don't find > model averaging appealing. I guess deep in my heart I expect more from my > model than just the best predictive ability. > > Rubén > > -----Mensaje original----- > De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker > Enviado el: miércoles, 25 de enero de 2012 15:41 > Para: r-help@.ethz > Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and > unique combinations? > > Rubén Roa <rroa <at> azti.es> writes: > >> A more 'manual' way to do it is this. > >> If you have all your models named properly and well organized, say >> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces >> a list with one component named 'aic') then something like >> this: > >> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3] >> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])- >> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))} > > >> will calculate all the 1081 AIC differences between paired comparisons >> of the 47 models. It may not be pretty but works for me. > > >> An AIC difference equal or less than 2 is a tie, anything higher is >> evidence for the model with the less AIC (Sakamoto et al., Akaike >> Information Criterion Statistics, KTK Scientific Publishers, Tokyo). > > > I wouldn't go quite as far as Frank Harrell did in his answer, but > > (1) it seems silly to do all pairwise comparisons of models; one of the > big advantages of information-theoretic approaches is that you can just > list the models in order of AIC ... > > (2) the dredge() function from the MuMIn package may be useful for > automating this sort of thing. There is an also an AICtab function in the > emdbook package. > > (3) If you're really just interested in picking the single model with the > best expected predictive capability (and all of the other assumptions of > the AIC approach are met -- very large data set, good fit to the data, > etc.), then just picking the model with the best AIC will work. It's when > you start to make inferences on the individual parameters within that > model, without taking account of the model selection process, that you run > into trouble. If you are really concerned about good predictions then it > may be better to do multi-model averaging *or* use some form of shrinkage > estimator. > > (4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, > but it drives me nuts when people use it as a quasi-significance-testing > rule, as in "the simpler model is adequate if dAIC<2". If you're going to > work in the model selection arena, then don't follow hypothesis-testing > procedures! A smaller AIC means lower expected K-L distance, period. > > For the record, Brian Ripley has often expressed the (minority) opinion > that AIC is *not* appropriate for comparing non-nested models (e.g. > <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>). > > >> >> -----Original Message----- >> From: r-help-bounces <at> r-project.org on behalf of Milan >> Bouchet-Valat >> Sent: Wed 1/25/2012 10:32 AM >> To: Jhope >> Cc: r-help <at> r-project.org > >> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 >> interactions and unique combinations? > > >> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : >> > Hi R-listers, >> > >> > I have developed 47 GLM models with different combinations of >> > interactions from 1 variable to 5 variables. I have manually made >> > each model separately and put them into individual tables (organized >> > by the number of variables) showing the AIC score. I want to compare >> all of these models. >> > >> > 1) What is the best way to compare various models with unique >> > combinations and different number of variables? >> See ?step or ?stepAIC (from package MASS) if you want an automated way >> of doing this. >> >> > 2) I am trying to develop the most simplest model ideally. Even >> > though adding another variable would lower the AIC, > > No, not necessarily. > >> how do I interpret it is worth >> > it to include another variable in the model? How do I know when to >> stop? > > When the AIC stops decreasing. > >> This is a general statistical question, not specific to R. As a >> general rule, if adding a variable lowers the AIC by a significant >> margin, then it's worth including it. > > If the variable lowers the AIC *at all* then your best estimate is that > you should include it. > >> You should only stop when a variable increases the AIC. But this is >> assuming you consider it a good indicator and you know what you're >> doing. There's plenty of literature on this subject. >> >> > Definitions of Variables: >> > HTL - distance to high tide line (continuous) Veg - distance to >> > vegetation Aeventexhumed - Event of exhumation Sector - number >> > measurements along the beach Rayos - major sections of beach >> > (grouped sectors) TotalEggs - nest egg density >> > >> > Example of how all models were created: >> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, >> > data=data.to.analyze, family=binomial) Model7.glm <- >> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, >> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) >> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm >> > <- glm(cbind(Shells, TotalEggs-Shells) ~ >> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial) >> To extract the AICs of all these models, it's easier to put them in a >> list and get their AICs like this: >> m <- list() >> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, >> data=data.to.analyze, family=binomial) >> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = >> binomial, data.to.analyze) >> >> sapply(m, extractAIC) >> >> Cheers >> > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331016.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.