On Wed, 17 Dec 2008, Tom Cattaert wrote:
Hi,
I am using the packages tree and rpart to build a classification tree to
predict a 0/1 outcome. The package rpart has the advantage that the function
plotcp gives a visual representation of the cross-validation results with a
horizontal line indicating the 1 standard error rule, i.e. the
recommendation to select the most parsimonious model (the smallest tree)
whose error is not more than one standard error above the error of the best
model.
However, in the rpart package I am not getting trees of all sizes but for
example three sizes are 1,2,5 in one example I am working with, while with
cv.tree in package tree it gives 1,2,3,4,5 like I would guess it should
(weakest link pruning successively collapses the internal nodes that
contrubute the least). What is the reason for this?
How are we to know without the reproducible example you were asked for?
The pruning sequence need not cover all sizes, but it depends on the
inputs and the tuning parameters.
A second problem I am having in both packages is that the cross-validation
results are highly variable between different runs of the programs. This is
not unexpected as cross-validations means that the dataset is randomly
divided in 10 equal subsets, which can be done in a lot of different ways.
One then hopes that the results do not depend on this very much, but I
observed they do often. Should one then do this many times, e.g. 100, each
time select the model using the 1 standard error rule, and in the end count
which model got selected most often? Or rather do it many times and average
the means and standard errors of the prediction error? Or does a very high
variability in cross-validation results mean that the dataset is too small
to reach conclusions?
MASS (the book) covers this.
Kind regards and thanks for your help,
Tom
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.