Dave, > I have been using random forest on a data set with 226 sites and 36 > explanatory variables (continuous and categorical). When I use > "tune.randomforest" to determine the best value to use in "mtry" there
> is a fairly consistent and steady decrease in MSE, with the optimum of > "mtry" usually equal to 1. Why would that occur, and what does it > signify? What I would assume is that most of my explanatory variables > have little to no explanatory power. Does that sound about right? I'm not sure that it means anything (I've seen this happen too). Essentially, this would indicate that, for this particular dataset, the random forest model needs the trees to be as uncorrelated as possible. If it were to "like" mtry = # predictors, this would indicate that bagging was the optimal model. There is the no free lunch theorem and this would apply to possible random forest sub-models; without information related to the specifics of the problem at hand, there is no reason to believe that any one model is uniformly best across problems. Did you have any subject-specific reason to think larger values of mtry were optimal? What was the difference in performance across all of the candidate values of mtry? I don't usually see a huge effect of altering mtry (change in accuracy or Rsquared <= 5% in classification and regression models, respectively) relative the variation in resampling. Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.