Thank you, Terry, for your answer. I'll try to explain better my question. When you create a classification or regression tree you first grow a tree based on a splitting criteria: this usually results in a large tree that provides a good fit to the training data. The problem with this tree is its potential for overfitting the data: the tree can be tailored too specifically to the training data and not generalize well to new data. The solution (apart cross-validation) is to find a smaller subtree that results in a low error rate on holdout or validation data.
Hope it helps to clarity my question. Best, Alfredo -----Messaggio originale----- Da: Therneau, Terry M., Ph.D. [mailto:thern...@mayo.edu] You will need to give more detail of exactly what you mean by "prune using a validation set". THe prune.rpart function will prune at any value you want, what I suspect you are looking for is to compute the error of each possible tree, using a validation data set, then find the best one, and then prune there. How do you define "best"? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.