Let's try an example:
 
R> iris.1tree <- randomForest(Species ~ ., data=iris, ntree=1)
R> getTree(iris.1tree, 1)
  left daughter right daughter split var split point status prediction
1             2              3         4        0.80      1          0
2             0              0         0        0.00     -1          1
3             4              5         4        1.75      1          0
4             0              0         0        0.00     -1          2
5             6              7         3        4.85      1          0
6             8              9         1        6.05      1          0
7             0              0         0        0.00     -1          3
8             0              0         0        0.00     -1          2
9             0              0         0        0.00     -1          3
R> iris[1,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
R> predict(iris.1tree, iris[1,], type="prob")
  setosa versicolor virginica
1      1          0         0
R> levels(iris$Species)
[1] "setosa"     "versicolor" "virginica" 

The getTree() function showed the first (and only) tree.  To predict the
first row of iris, we read the tree in the following way.  In the first
row (the root node), the variable to split is the 4th, or "Petal.Width".
The splitting point is 0.8, so data points with Petal.Width < 0.8 go to
left and others go to right.  Since the "left daughter" is "2", we look
at the second row of the tree, and it is a leaf (i.e., a terminal node)
since the status is -1.  The prediction is "1", or the first level of
the factor--- "setosa".  I don't expect anyone to predict data
"manually" like this.  predict.randomForest() does all this for you.
 
As to individual tree predictions, predict.randomForest() has an option
"predict.all" that you can use.  To get the OOB votes, though, you will
also need to look at the output of randomForest(..., inbag=TRUE) to see
which data point is OOB for which tree.
 
I hope that's clear now.
 
Cheers,
Andy
 


________________________________

        From: Chrysanthi A. [mailto:[email protected]] 
        Sent: Tuesday, April 28, 2009 8:52 AM
        To: Liaw, Andy
        Cc: [email protected]
        Subject: Re: [R] help with random forest package
        
        

        Many thanks for your help. Sorry for my delayed reply, but I was
away. 
        Regarding the OOB error, sorry it was a typo. 
        
        As far as the voting, I was just wondering if there is a
function that will give me the prediction of each case through each
tree. Is there any function that produce the rules for each tree? If I
have a new case that I want to predict the class that it belongs to, how
can I predict that? I should look to each tree and then get the voting?
Or are there some predictive rules that I can use? I cannot do that
prediction from the results that function votes give to me...
        
        Also, I was wondering why randomizations along with combining
the predictions from the trees significantly improve the overall
predictive accuracy? 
        
        Thanks a lot,
        
        Chrysanthi
        
        
        
        
        
        2009/4/13 Liaw, Andy <[email protected]>
        

                I really don't understand what you don't understand.  Do
you know how a tree forms a prediction?  If not, it may be a good idea
to learn about that first.  The code runs prediction of each case
through all trees in the forest and that's how the votes are formed.  
                 
                [For OOB predictions, only predictions from trees for
which the case is out-of-bag are counted.  That's why you may get
odd-ball vote fractions even when you grow 100 trees and expect the
votes to be in seq(0, 1, by=0.01).]
                 
                100% - 2.34% = 97.66%, not 76.6% (I can only assume you
had a typo).
                 
                Cheers,
                Andy


________________________________

                        
                        From: Chrysanthi A. [mailto:[email protected]] 
                        
                        Sent: Monday, April 13, 2009 9:44 AM 

                        To: Liaw, Andy
                        Cc: [email protected]
                        Subject: Re: [R] help with random forest package
                        


                        But how does it estimate that voting output? How
does it get the 85.7% for all the trees? 

                        Regarding the prediction accuracy. If I have OOB
error = 2.34, then the prediction accuracy will be equal to 76.6%,
right? 

                        Many thanks,

                        Chrysanthi.


                        2009/4/13 Liaw, Andy <[email protected]>
                        

                                RF forms prediction by voting.  Note
that each row in the output sums to 1.  It says 85.7% of the trees
classified the first case as "healthy" and the other 14.3% of the trees
"unhealthy".  The majority (in two-class cases like this one) wins, so
the prediction is "healthy".
                                 
                                You can take 1 - OOB error rate as the
estimate of prediction accuracy (if you have not selected variables,
e.g., using variable importance, in building the final RF model).
                                 
                                Andy


________________________________

                                
                                From: Chrysanthi A.
[mailto:[email protected]] 
                                
                                Sent: Friday, April 10, 2009 10:44 AM 

                                To: Liaw, Andy
                                Cc: [email protected]
                                Subject: Re: [R] help with random forest
package
                                



                                Hi,
                                
                                To be honest, I cannot really understand
what is the meaning of the votes.. For example having five samples and
two classes what the numbers below means?
                                      healthy  unhealthy
                                1  0.85714286 0.14285714
                                2  0.92857143 0.07142857
                                3  0.90000000 0.10000000
                                4  0.92857143 0.07142857
                                5  0.84615385 0.15384615
                                
                                Suppose now, having the classification,
I have an unknown sample and according to the results that Ive got, how
can I predict in which class it belongs to? Do the votes give that
prediction to us? 
                                
                                Also,  the error is reported on the "OOB
estimate of  error rate", right? For example, if we have OOB estimate of
error rate:2.34%, we can say that the prediction accuracy is approx.
97.7%? How can we estimate the prediction accuracy? 


                                Thanks a lot,
                                
                                Chrysanthi.
                                
                                
                                
                                2009/4/8 Liaw, Andy
<[email protected]>
                                

                                I'm not quite sure what you're asking.
RF predicts by classifying the new observation using all trees in the
forest, and take plural vote.  The predict() method for randomForest
objects does that for you.  The getTree() function shows you what each
individual tree is like (not visually, just the underlying
representation of the tree).
                                 
                                Andy


________________________________

                                From: Chrysanthi A.
[mailto:[email protected]] 
                                Sent: Wednesday, April 08, 2009 2:56 PM
                                To: Liaw, Andy
                                Cc: [email protected]
                                Subject: Re: [R] help with random forest
package
                                
                                
                                Many thanks for the reply.
                                
                                So, extracting the votes, how can we
clarify the classification result? If I want to predict in which class
will be included an unknown sample, what is the rule that will give me
that?
                                
                                Thanks a lot,
                                
                                Chrysanthi.
                                
                                
                                
                                
                                2009/4/8 Liaw, Andy
<[email protected]>
                                

                                The source code of the whole package is
available on CRAN.  All packages
                                are submitted to CRAN is source form.
                                
                                There's no "rule" per se that gives the
final prediction, as the final
                                prediction is the result of plural vote
by all trees in the forest.
                                
                                You may want to look at the varUsed()
and getTree() functions.
                                
                                Andy
                                
                                From:  Chrysanthi A.
                                
                                > Hello,
                                >
                                > I am a phd student in Bioinformatics
and I am using the Random Forest
                                > package in order to classify my data,
but I have some questions.
                                > Is there a function in order to
visualize the trees, so as to
                                > get the rules?
                                > Also, could you please provide me with
the code of
                                > "randomForest" function,
                                > as I would like to see how it works. I
was wondering if I can get the
                                > classification having the most votes
over all the trees in
                                > the forest (the
                                > final rules that will give me the
final classification).
                                > Also, is there a
                                > possibility to get a vector with the
attributes that are
                                > being selected for
                                > each node during the construction of
each tree? I mean, that
                                > I would like to
                                > know the m<<M variables that are
selected at each node out of
                                > the M input
                                > attributes.. Are they selected
randomly? Is there a
                                > possibility to select
                                > the same variable in subsequent nodes?
                                >
                                > Thanks a lot,
                                >
                                > Chrysanthi.
                                >
                                
                                >       [[alternative HTML version
deleted]]
                                >
                                >
______________________________________________
                                > [email protected] mailing list
                                >
https://stat.ethz.ch/mailman/listinfo/r-help
                                > PLEASE do read the posting guide
                                >
http://www.R-project.org/posting-guide.html
                                > and provide commented, minimal,
self-contained, reproducible code.
                                >
                                Notice:  This e-mail message, together
with any attachments, contains
                                information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
                                New Jersey, USA 08889), and/or its
affiliates (which may be known
                                outside the United States as Merck
Frosst, Merck Sharp & Dohme or
                                MSD and in Japan, as Banyu - direct
contact information for affiliates is
                                available at
http://www.merck.com/contact/contacts.html) that may be
                                confidential, proprietary copyrighted
and/or legally privileged. It is
                                intended solely for the use of the
individual or entity named on this
                                message. If you are not the intended
recipient, and have received this
                                message in error, please notify us
immediately by reply e-mail and
                                then delete it from your system.
                                
                                


                                Notice:  This e-mail message, together
with any attachments, contains
                                information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
                                New Jersey, USA 08889), and/or its
affiliates (which may be known
                                outside the United States as Merck
Frosst, Merck Sharp & Dohme or
                                MSD and in Japan, as Banyu - direct
contact information for affiliates is
                                available at
http://www.merck.com/contact/contacts.html) that may be
                                confidential, proprietary copyrighted
and/or legally privileged. It is
                                intended solely for the use of the
individual or entity named on this
                                message. If you are not the intended
recipient, and have received this
                                message in error, please notify us
immediately by reply e-mail and
                                then delete it from your system.



                                Notice:  This e-mail message, together
with any attachments, contains
                                information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
                                New Jersey, USA 08889), and/or its
affiliates (which may be known
                                outside the United States as Merck
Frosst, Merck Sharp & Dohme or
                                MSD and in Japan, as Banyu - direct
contact information for affiliates is
                                available at
http://www.merck.com/contact/contacts.html) that may be
                                confidential, proprietary copyrighted
and/or legally privileged. It is
                                intended solely for the use of the
individual or entity named on this
                                message. If you are not the intended
recipient, and have received this
                                message in error, please notify us
immediately by reply e-mail and
                                then delete it from your system.


                Notice:  This e-mail message, together with any
attachments, contains
                information of Merck & Co., Inc. (One Merck Drive,
Whitehouse Station,
                New Jersey, USA 08889), and/or its affiliates (which may
be known
                outside the United States as Merck Frosst, Merck Sharp &
Dohme or
                MSD and in Japan, as Banyu - direct contact information
for affiliates is
                available at http://www.merck.com/contact/contacts.html)
that may be
                confidential, proprietary copyrighted and/or legally
privileged. It is
                intended solely for the use of the individual or entity
named on this
                message. If you are not the intended recipient, and have
received this
                message in error, please notify us immediately by reply
e-mail and
                then delete it from your system.


Notice:  This e-mail message, together with any attachme...{{dropped:15}}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to