Lucke, Joseph F <Joseph.F.Lucke <at> uth.tmc.edu> writes: > > And to follow FH and HW > > What level of significance are you using? .05 is excessively liberal. > Are you adjusting your p-values for the number of possible models? Do > you realize the p-values for dropping a term, being selected as the > maximum of a set of p-values, do not follow their usual distributions? > How are you compensating for sample size, as a p-value's being > significant is a function of sample size? How are you compensating for > the fact that the current model choice is dependent on the previous > model choices? How do you know your tree of model choices is the optimal > one? Have you considered cross-validation? Are you looking for a model > that true describes a phenomenon or a predictive model that can be used > for practical purposes? >
Ouch. While Frank Harrell and Joseph Lucke are raising serious issues about model selection, maybe we could keep in mind that we don't want to scare off all the students who ever try to use R to figure out basic statistics. I would follow Peter Dalgaard's advice (about "drop1") and Hadley Wickham's (about graphical diagnostics), and if possible bring up the other issues about model selection with others around you -- if you're a student, ask your prof. or someone in the stats department. It can be tough to try to do things right if those around you are still doing them wrong ... If you tell us what field you're in we may be able to point you to more subject-specific references (e.g. Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury, and Robert P. Freckleton. 2006. Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75, no. 5: 1182-1189) Ben Bolker ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.