Dear List Members, I have encountered two problems when using the step function to select models. To better illustrate the problems, an R image (step.add1.test.RData) which includes the objects needed to run the code (step.add1.test.R) can be found at www.biostat.wisc.edu/~pwang/r-help/<http://www.biostat.wisc.edu/%7Epwang/r-help/>
lm.data.frame have factor variables with 3 levels. The following run shows the first problem. AICs (* and **) are different. I noticed that the Df for rs13482096:rs13483699 is 4, while I think Df should be 6, 2 from rs13483699 and 4 from interactions. When I ran add1 directly, I got Df=6 and AIC 848.75. > step2.bic.out <- step(step.bic.out, scope=list(lower=scope.lower2, upper=scope.upper2), + direction="both", k=log(length(step.bic.out$y)), trace=1) Start: AIC=841.18 pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 Df Deviance AIC + rs13482096:rs13483699 4 216.63 840.63 (*) <none> 233.82 841.18 - rs8254221 2 244.08 842.90 - rs13482096 2 245.20 844.31 ...... Step: AIC=848.75 (**) pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 + rs13482096:rs13483699 > add1(step.bic.out, scope="rs13482096:rs13483699", k=log(length(step.bic.out$y))) Single term additions Model: pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 Df Deviance AIC <none> 233.82 841.18 rs13482096:rs13483699 6 214.28 848.75 (**) When I used add1 to handle terms to be added together and separately, I got different results. The example is shown below. This might explain the discrepancy shown above. > add1(step.bic.out, scope=int.terms[11:12], k=log(length(step.bic.out$y))) Single term additions Model: pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 Df Deviance AIC <none> 233.82 841.18 rs13479085:rs13475933 6 224.95 863.66 rs13480057:rs13475933 4 226.72 854.62 (***) > add1(step.bic.out, scope=int.terms[11], k=log(length(step.bic.out$y))) Single term additions Model: pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 Df Deviance AIC <none> 233.82 841.18 rs13479085:rs13475933 6 224.95 863.66 > add1(step.bic.out, scope=int.terms[12], k=log(length(step.bic.out$y))) Single term additions Model: pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 Df Deviance AIC <none> 233.82 841.18 rs13480057:rs13475933 6 215.95 851.12 (***) Another problem is that the final model seems to be the first model that satisfies (bAIC >= AIC + 1e-07) if steps haven't used up, rather than the one before that. Please see below. > formula(step2.bic.out) pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 + rs13482096:rs13483699 > step2.bic.out$anova Step Df Deviance Resid. Df Resid. Dev AIC 1 NA NA 298 233.8226 841.1784 Any insights would be greatly appreciated. Thanks much ! Ping Wang -- Ping Wang Graduate Student Department of Statistics University of Wisconsin-Madison Medical Sciences Center (1245) 1300 University Avenue Madison, Wisconsin 53706 http://www.biostat.wisc.edu/~pwang/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.