The importance measures show how much MSE or Impurity increase when that 
variable is randomly permuted.  If you randomly permute a variable that does 
not gain you anything in prediction, then predictions won't change much and you 
will only see small changes in impurity and mse.  On the other hand the 
important variables will change the predictions by quite a bit if randomly 
permuted, so you will see bigger changes.  Turn this around and you see big 
changes indicate important variables.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Mareike Ließ
> Sent: Wednesday, April 28, 2010 12:35 PM
> To: Liaw, Andy
> Cc: r-help@r-project.org
> Subject: Re: [R] Question on: Random Forest Variable Importance for
> RegressionProblems
> 
> Well, explanation on "importance" says, that for regression the first
> column (%IncMSE)
> is the mean decrease in accuracy and the second ("IncNodePurity") the
> mean decrease in MSE.
> Dose not make much sense at all.
> I do not know what "%IncMSE" stands for. Alright, MSE= mean square
> error", but of what exactly,
> since it is found next to the variables used for prediction.
> And what does "%Inc" refer to?  percent increase? But again percent
> increase regarding mean square error?
> Why would I want to increase the mean square error?
> If as I assume "IncNodePurity" stands for increase in node
> impurity...why would I want to increase the node impurity?
> It would really help a lot to know how these two values are exactly
> calculated and what they stand for. Both is not clear.
> 
> Thanks
> Mareike
> 
> 
> 
> Liaw, Andy schrieb:
> > I would have thought that the help page for importance() is an (the?)
> obvious place to look...
> >
> > If that description is not clear, please let me know which part isn't
> clear to you.
> >
> > Andy
> >
> > From: Mareike Lies
> >
> >> I am trying to use the package RandomForest performing regression.
> >> The variable importance estimates are given as:  "%IncMSE"
> >>  and
> >> "IncNodePurity"
> >> Can anyone explain me what these refer to and how they are
> calculated?
> >> I found a lot of information on variable importance measures for
> >> classification problems, but nothing on regression.
> >>
> >> Thanks a lot.
> >> Mareike
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> > Notice:  This e-mail message, together with any
> attach...{{dropped:13}}
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to