If my aim is to select a good subset of parameters for my final logit
model built using glm(). What is the best way to cross-validate the
results so that they are reliable?

Let's say that I have a large dataset of 1000's of observations. I
split this data into two groups, one that I use for training and
another for validation. First I use the training set to build a model,
and the the stepAIC() with a Forward-Backward search. BUT, if I base
my parameter selection purely on this result, I suppose it will be
somewhat skewed due to the 1-time data split (I use only 1 training
dataset)

What is the correct way to perform this variable selection? And are
the readily available packages for this?

Similarly, when I have my final parameter set, how should I go about
and make the final assessment of the models predictability? CV? What
package?


Thank you in advance,
Jay

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to