And to follow FH and HW What level of significance are you using? .05 is excessively liberal. Are you adjusting your p-values for the number of possible models? Do you realize the p-values for dropping a term, being selected as the maximum of a set of p-values, do not follow their usual distributions? How are you compensating for sample size, as a p-value's being significant is a function of sample size? How are you compensating for the fact that the current model choice is dependent on the previous model choices? How do you know your tree of model choices is the optimal one? Have you considered cross-validation? Are you looking for a model that true describes a phenomenon or a predictive model that can be used for practical purposes?
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham Sent: Wednesday, June 11, 2008 9:34 AM To: Frank E Harrell Jr Cc: r-help@r-project.org; ChCh Subject: Re: [R] model simplification using Crawley as a guide On Wed, Jun 11, 2008 at 6:42 AM, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote: > ChCh wrote: >> >> Hello, >> >> I have consciously avoided using step() for model simplification in >> favour of manually updating the model by removing non-significant >> terms one at a time. I'm using The R Book by M.J. Crawley as a >> guide. It comes as no surprise that my analysis does proceed as >> smoothly as does Crawley's and being a beginner, I'm struggling with what to do next. >> I have a model: >> >> lm(y~A * B * C) >> >> where A is a categorical variable with three levels and B and C are >> continuous covariates. >> >> Following Crawley, I execute the model, then use summary.aov() to >> identify non-significant terms. I begin deleting non-significant >> interaction terms one at a time (using update). After each update() >> statement, I use >> anova(modelOld,modelNew) to contrast the previous model with the >> updated one. After removing all the interaction terms, I'm left with: >> >> lm(y~ A + B + C) >> >> again, using summary.aov() I identify A to be non-significant, so I >> remove it, leaving: >> >> lm(y~B + C) both of which are continuous variables >> >> Does it still make sense to use summary.aov() or should I use >> summary.lm() instead? Has the analysis switched from an ANCOVA to a >> regression? Both give different results so I'm uncertain which summary to accept. >> >> Any help would be appreciated! >> >> > > What is the theoretical basis for removing insignificant terms? How > will you compensate for this in the final analysis (e.g., how do you > unbias your estimate of sigma squared)? And in a similar vein, where are your exploratory graphics? How do you know that there is a linear relationship between your response and your predictors? Are the distributional assumptions you are making appropriate? Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.