From: SR Millis <srmil...@yahoo.com> To: Jin Minming <jminm...@yahoo.com> Sent: Monday, January 30, 2012 9:25 AM Subject: Re: [R] Variable selection based on both training and testing data
Jim, First, stepwise methods for variable selection should be avoided. Frank Harrell (in Regression Modeling Strategies) discusses this at length. Second, splitting a dataset into training and validation sets is generally not a good idea unless you have a really large sample, eg, > 20,000. As Harrell has discussed, split-sample validation does not provide external validation, is terribly inefficient, and is arbitrary. It's better to specify your model a priori and use the bootstrap to obtain an estimate of your model's over-optimism. Bootstrapping can be implemented with Harrell's rms package in R. Scott ~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat® Professor Wayne State University School of Medicine Email: aa3...@wayne.edu Email: srmil...@yahoo.com Tel: 313-993-8085 ________________________________ To: r-help@r-project.org Sent: Monday, January 30, 2012 8:14 AM Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.