[R] R: How to prune using holdout data

Alfredo Mon, 27 Feb 2017 07:51:06 -0800

Thank you, Terry, for your answer. 

I'll try to explain better my question. When you create a classification or
regression tree you first grow a tree based on a splitting criteria: this
usually results in a large tree that provides a good fit to the training
data. The problem with this tree is its potential for overfitting the data:
the tree can be tailored too specifically to the training data and not
generalize well to new data. The solution (apart cross-validation) is to
find a smaller subtree that results in a low error rate on holdout or
validation data.


Hope it helps to clarity my question.

Best,

Alfredo

 

 

-----Messaggio originale-----
Da: Therneau, Terry M., Ph.D. [mailto:thern...@mayo.edu] 

You will need to give more detail of exactly what you mean by "prune using a
validation set".  THe prune.rpart function will prune at any value you want,
what I suspect you are looking for is to compute the error of each possible
tree, using a validation data set, then find the best one, and then prune
there.

How do you define "best"?


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: How to prune using holdout data

Reply via email to