I have a question about predictor importances in randomForest. Once I've run randomForest and got my object, I get their importances: rfresult$importance I also get the "standard errors" of the permutation-based importance measure: rfresult$importanceSD
I have 2 questions: 1. Because I am dealing with regressions, I am getting an importance object (rfresult$importance) with two columns, labeled "%IncMSE" (the first column) and "IncNodePurity" (the second column). I assume it's the first one that is the mean decrease in accuracy due to permutation. Am I correct or am I wrong? I am confused because ?randomForest says: "or Regression, the first column is the mean decrease in accuracy and the second the mean decrease in MSE." - but it is the first column, not the second that has "MSE" in its header. 2. According to this thread ( http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg94873.html), The overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is sd(d[i])/sqrt(ntree) (the "standard error"). So, in order to get at the importance of predictors (and I want to use the permutation-based importance) - should I just take the first column of rfresult$importance or should I first divide rfresult$importance by rfresult$importanceSD - to get something analogous to z-scores and use those? Thank you very much! -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.