Let's try an example:
R> iris.1tree <- randomForest(Species ~ ., data=iris, ntree=1)
R> getTree(iris.1tree, 1)
left daughter right daughter split var split point status prediction
1 2 3 4 0.80 1 0
2 0 0 0 0.00 -1 1
3 4 5 4 1.75 1 0
4 0 0 0 0.00 -1 2
5 6 7 3 4.85 1 0
6 8 9 1 6.05 1 0
7 0 0 0 0.00 -1 3
8 0 0 0 0.00 -1 2
9 0 0 0 0.00 -1 3
R> iris[1,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
R> predict(iris.1tree, iris[1,], type="prob")
setosa versicolor virginica
1 1 0 0
R> levels(iris$Species)
[1] "setosa" "versicolor" "virginica"
The getTree() function showed the first (and only) tree. To predict the
first row of iris, we read the tree in the following way. In the first
row (the root node), the variable to split is the 4th, or "Petal.Width".
The splitting point is 0.8, so data points with Petal.Width < 0.8 go to
left and others go to right. Since the "left daughter" is "2", we look
at the second row of the tree, and it is a leaf (i.e., a terminal node)
since the status is -1. The prediction is "1", or the first level of
the factor--- "setosa". I don't expect anyone to predict data
"manually" like this. predict.randomForest() does all this for you.
As to individual tree predictions, predict.randomForest() has an option
"predict.all" that you can use. To get the OOB votes, though, you will
also need to look at the output of randomForest(..., inbag=TRUE) to see
which data point is OOB for which tree.
I hope that's clear now.
Cheers,
Andy
________________________________
From: Chrysanthi A. [mailto:[email protected]]
Sent: Tuesday, April 28, 2009 8:52 AM
To: Liaw, Andy
Cc: [email protected]
Subject: Re: [R] help with random forest package
Many thanks for your help. Sorry for my delayed reply, but I was
away.
Regarding the OOB error, sorry it was a typo.
As far as the voting, I was just wondering if there is a
function that will give me the prediction of each case through each
tree. Is there any function that produce the rules for each tree? If I
have a new case that I want to predict the class that it belongs to, how
can I predict that? I should look to each tree and then get the voting?
Or are there some predictive rules that I can use? I cannot do that
prediction from the results that function votes give to me...
Also, I was wondering why randomizations along with combining
the predictions from the trees significantly improve the overall
predictive accuracy?
Thanks a lot,
Chrysanthi
2009/4/13 Liaw, Andy <[email protected]>
I really don't understand what you don't understand. Do
you know how a tree forms a prediction? If not, it may be a good idea
to learn about that first. The code runs prediction of each case
through all trees in the forest and that's how the votes are formed.
[For OOB predictions, only predictions from trees for
which the case is out-of-bag are counted. That's why you may get
odd-ball vote fractions even when you grow 100 trees and expect the
votes to be in seq(0, 1, by=0.01).]
100% - 2.34% = 97.66%, not 76.6% (I can only assume you
had a typo).
Cheers,
Andy
________________________________
From: Chrysanthi A. [mailto:[email protected]]
Sent: Monday, April 13, 2009 9:44 AM
To: Liaw, Andy
Cc: [email protected]
Subject: Re: [R] help with random forest package
But how does it estimate that voting output? How
does it get the 85.7% for all the trees?
Regarding the prediction accuracy. If I have OOB
error = 2.34, then the prediction accuracy will be equal to 76.6%,
right?
Many thanks,
Chrysanthi.
2009/4/13 Liaw, Andy <[email protected]>
RF forms prediction by voting. Note
that each row in the output sums to 1. It says 85.7% of the trees
classified the first case as "healthy" and the other 14.3% of the trees
"unhealthy". The majority (in two-class cases like this one) wins, so
the prediction is "healthy".
You can take 1 - OOB error rate as the
estimate of prediction accuracy (if you have not selected variables,
e.g., using variable importance, in building the final RF model).
Andy
________________________________
From: Chrysanthi A.
[mailto:[email protected]]
Sent: Friday, April 10, 2009 10:44 AM
To: Liaw, Andy
Cc: [email protected]
Subject: Re: [R] help with random forest
package
Hi,
To be honest, I cannot really understand
what is the meaning of the votes.. For example having five samples and
two classes what the numbers below means?
healthy unhealthy
1 0.85714286 0.14285714
2 0.92857143 0.07142857
3 0.90000000 0.10000000
4 0.92857143 0.07142857
5 0.84615385 0.15384615
Suppose now, having the classification,
I have an unknown sample and according to the results that Ive got, how
can I predict in which class it belongs to? Do the votes give that
prediction to us?
Also, the error is reported on the "OOB
estimate of error rate", right? For example, if we have OOB estimate of
error rate:2.34%, we can say that the prediction accuracy is approx.
97.7%? How can we estimate the prediction accuracy?
Thanks a lot,
Chrysanthi.
2009/4/8 Liaw, Andy
<[email protected]>
I'm not quite sure what you're asking.
RF predicts by classifying the new observation using all trees in the
forest, and take plural vote. The predict() method for randomForest
objects does that for you. The getTree() function shows you what each
individual tree is like (not visually, just the underlying
representation of the tree).
Andy
________________________________
From: Chrysanthi A.
[mailto:[email protected]]
Sent: Wednesday, April 08, 2009 2:56 PM
To: Liaw, Andy
Cc: [email protected]
Subject: Re: [R] help with random forest
package
Many thanks for the reply.
So, extracting the votes, how can we
clarify the classification result? If I want to predict in which class
will be included an unknown sample, what is the rule that will give me
that?
Thanks a lot,
Chrysanthi.
2009/4/8 Liaw, Andy
<[email protected]>
The source code of the whole package is
available on CRAN. All packages
are submitted to CRAN is source form.
There's no "rule" per se that gives the
final prediction, as the final
prediction is the result of plural vote
by all trees in the forest.
You may want to look at the varUsed()
and getTree() functions.
Andy
From: Chrysanthi A.
> Hello,
>
> I am a phd student in Bioinformatics
and I am using the Random Forest
> package in order to classify my data,
but I have some questions.
> Is there a function in order to
visualize the trees, so as to
> get the rules?
> Also, could you please provide me with
the code of
> "randomForest" function,
> as I would like to see how it works. I
was wondering if I can get the
> classification having the most votes
over all the trees in
> the forest (the
> final rules that will give me the
final classification).
> Also, is there a
> possibility to get a vector with the
attributes that are
> being selected for
> each node during the construction of
each tree? I mean, that
> I would like to
> know the m<<M variables that are
selected at each node out of
> the M input
> attributes.. Are they selected
randomly? Is there a
> possibility to select
> the same variable in subsequent nodes?
>
> Thanks a lot,
>
> Chrysanthi.
>
> [[alternative HTML version
deleted]]
>
>
______________________________________________
> [email protected] mailing list
>
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html
> and provide commented, minimal,
self-contained, reproducible code.
>
Notice: This e-mail message, together
with any attachments, contains
information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its
affiliates (which may be known
outside the United States as Merck
Frosst, Merck Sharp & Dohme or
MSD and in Japan, as Banyu - direct
contact information for affiliates is
available at
http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted
and/or legally privileged. It is
intended solely for the use of the
individual or entity named on this
message. If you are not the intended
recipient, and have received this
message in error, please notify us
immediately by reply e-mail and
then delete it from your system.
Notice: This e-mail message, together
with any attachments, contains
information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its
affiliates (which may be known
outside the United States as Merck
Frosst, Merck Sharp & Dohme or
MSD and in Japan, as Banyu - direct
contact information for affiliates is
available at
http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted
and/or legally privileged. It is
intended solely for the use of the
individual or entity named on this
message. If you are not the intended
recipient, and have received this
message in error, please notify us
immediately by reply e-mail and
then delete it from your system.
Notice: This e-mail message, together
with any attachments, contains
information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its
affiliates (which may be known
outside the United States as Merck
Frosst, Merck Sharp & Dohme or
MSD and in Japan, as Banyu - direct
contact information for affiliates is
available at
http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted
and/or legally privileged. It is
intended solely for the use of the
individual or entity named on this
message. If you are not the intended
recipient, and have received this
message in error, please notify us
immediately by reply e-mail and
then delete it from your system.
Notice: This e-mail message, together with any
attachments, contains
information of Merck & Co., Inc. (One Merck Drive,
Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may
be known
outside the United States as Merck Frosst, Merck Sharp &
Dohme or
MSD and in Japan, as Banyu - direct contact information
for affiliates is
available at http://www.merck.com/contact/contacts.html)
that may be
confidential, proprietary copyrighted and/or legally
privileged. It is
intended solely for the use of the individual or entity
named on this
message. If you are not the intended recipient, and have
received this
message in error, please notify us immediately by reply
e-mail and
then delete it from your system.
Notice: This e-mail message, together with any attachme...{{dropped:15}}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.