I have a question on the output generated by randomForest in classification mode, specifically, the confusion matrix. The confusion matrix lists the various classes and how the forest classified each one, plus the classification error. Are these numbers essentially averages over all the trees in the forest? If so, is there a way I can get the standard deviation values out of the randomForest, or do I have to evaluate each tree individually? By way of illustration, let me show the confusion matrix using the iris data. The output below shows that the forest correctly classified 47 versicolor irises, but this is the result for the entire forest. I'd like to know if every tree will have 47 correctly classified versicolor irises, but I don't think it will. Same for the class.error value. Not every tree will have those exact same values, right?
But this raises another question. For this example, I used the entire data set to generate the forest, and so I assume that the confusion matrix is based on OOB data, so if I created a training set and evaluated trees individually in the test set I could get averages and standard deviations on the error rate. Any thoughts? Thanks in advance. -Miklos Z. Kiss > print(iris.rf) Call: randomForest(formula = Species ~ ., data = iris, importance = TRUE, keep.forest = TRUE) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 2 OOB estimate of error rate: 5.33% Confusion matrix: setosa versicolor virginica class.error setosa 50 0 0 0.00 versicolor 0 47 3 0.06 virginica 0 5 45 0.10 -- View this message in context: http://www.nabble.com/confusion-matrix-in-randomForest-tp18550873p18550873.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.