If my aim is to select a good subset of parameters for my final logit model built using glm(). What is the best way to cross-validate the results so that they are reliable?
Let's say that I have a large dataset of 1000's of observations. I split this data into two groups, one that I use for training and another for validation. First I use the training set to build a model, and the the stepAIC() with a Forward-Backward search. BUT, if I base my parameter selection purely on this result, I suppose it will be somewhat skewed due to the 1-time data split (I use only 1 training dataset) What is the correct way to perform this variable selection? And are the readily available packages for this? Similarly, when I have my final parameter set, how should I go about and make the final assessment of the models predictability? CV? What package? Thank you in advance, Jay ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.