HI, Andy,

Thanks so much for your reply!

IN the paper "Classification and regression by randomForest", the first
page,  there is "the random forest estimate the the importance of a variable
by looking at how much prediction error increase when the variable is
permuted..."

IN the help document of randomForest, the variable is measured in  total
decrease in node impurities. IT should be total* increase* in node
impurities? right?

if  total decrease in node impurities, will it be contradict with the paper?

ALso in the fit$importance, what is the meaning for first two columns?

> fit$importance
                         0           1 MeanDecreaseAccuracy MeanDecreaseGini
CT            0.0022352025 0.003829344         0.0030311246         5.184427
DP            0.0069461974 0.016387520         0.0116650960        15.440624
DY            0.0141150255 0.026031690         0.0200603555        19.901538
FC            0.0024279188 0.005158945         0.0037948155         5.527078
NE            0.0352705133 0.070503233         0.0527718526        46.278504
NW            0.0256059127 0.034433862         0.0299981496        26.440402
QT            0.0037228694 0.008181262         0.0059571350         9.308828
SK            0.0048187014 0.008895719         0.0068609174        10.662129
TA            0.0042134249 0.011746533         0.0079851331        12.878367
WC            0.0177155268 0.014981440         0.0163366320        14.240232
WD            0.0232972311 0.034083695         0.0286702065        25.335182
WG            0.0328547215 0.053142508         0.0429480441        30.663749
WW            0.0093983693 0.006377956         0.0078681474         7.250101
YG            0.0051691399 0.007338639         0.0062618144        11.084111
num_cell      0.0061355526 0.005373049         0.0057463613         5.060577
num_genes     0.0364878788 0.044544488         0.0404558096        32.745034
position      0.0025375614 0.011566496         0.0070255302        10.070505
freq_hypo     0.0008723241 0.001757602         0.0013181209         1.930695
freq_intra    0.0009449492 0.001943090         0.0014431451         2.611950
log_hypo      0.0004514713 0.001366561         0.0009096419         1.736749
acid_per      0.0125815445 0.023360179         0.0179634375        21.131681
base_per      0.0070077737 0.012196570         0.0096129124        13.675893
charge_per    0.0095668425 0.024125997         0.0168345956        20.969665
hydrophob_per 0.0185736697 0.031941513         0.0252200036        25.994903
polar_per     0.0169369327 0.023633413         0.0202776247        20.890415




On Thu, Apr 29, 2010 at 5:22 AM, Liaw, Andy <andy_l...@merck.com> wrote:

>  Please see the "Detail" section of the help page for the importance()
> function in the randomForest package, and let me know which part of it you
> do not understand.
>
> For boosting, you need to read its documentation and decide for yourself if
> its importance measure is at all comparable to the two in RF.
>
> Andy
>
>  ------------------------------
> *From:* Changbin Du [mailto:changb...@gmail.com]
> *Sent:* Wednesday, April 28, 2010 8:58 PM
> *To:* Liaw, Andy
> *Cc:* r-help@r-project.org
> *Subject:* variable importance in Random Forest
>
> HI, Dear Andy,
>
> I run the RandomFOrest in R, and get the following resutls in variable
> importance:
>
> What is the meaning of MeanDecreaseAccuracy  and MeanDecreaseGini?
>
> I found they are raw values, they are not scaled to 1, right?
>
> Which column if most similar to the variable rel.influence in Boosting?
>
> Thanks so much!
>
>
>
> > fit$importance
>                          0           1 MeanDecreaseAccuracy
> MeanDecreaseGini
> CT            0.0022352025 0.003829344         0.0030311246
> 5.184427
> DP            0.0069461974 0.016387520         0.0116650960
> 15.440624
> DY            0.0141150255 0.026031690         0.0200603555
> 19.901538
> FC            0.0024279188 0.005158945         0.0037948155
> 5.527078
> NE            0.0352705133 0.070503233         0.0527718526
> 46.278504
> NW            0.0256059127 0.034433862         0.0299981496
> 26.440402
> QT            0.0037228694 0.008181262         0.0059571350
> 9.308828
> SK            0.0048187014 0.008895719         0.0068609174
> 10.662129
> TA            0.0042134249 0.011746533         0.0079851331
> 12.878367
> WC            0.0177155268 0.014981440         0.0163366320
> 14.240232
> WD            0.0232972311 0.034083695         0.0286702065
> 25.335182
> WG            0.0328547215 0.053142508         0.0429480441
> 30.663749
> WW            0.0093983693 0.006377956         0.0078681474
> 7.250101
> YG            0.0051691399 0.007338639         0.0062618144
> 11.084111
> num_cell      0.0061355526 0.005373049         0.0057463613
> 5.060577
> num_genes     0.0364878788 0.044544488         0.0404558096
> 32.745034
> position      0.0025375614 0.011566496         0.0070255302
> 10.070505
> freq_hypo     0.0008723241 0.001757602         0.0013181209
> 1.930695
> freq_intra    0.0009449492 0.001943090         0.0014431451
> 2.611950
> log_hypo      0.0004514713 0.001366561         0.0009096419
> 1.736749
> acid_per      0.0125815445 0.023360179         0.0179634375
> 21.131681
> base_per      0.0070077737 0.012196570         0.0096129124
> 13.675893
> charge_per    0.0095668425 0.024125997         0.0168345956
> 20.969665
> hydrophob_per 0.0185736697 0.031941513         0.0252200036
> 25.994903
> polar_per     0.0169369327 0.023633413         0.0202776247
> 20.890415
>
>
>
>
>
>
>
>
>
>
> --
> Sincerely,
> Changbin
> --
>
>
> Notice:  This e-mail message, together with any attach...{{dropped:21}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to