I don't think this is so hard to explain. If you evaluate AUC using either OOB prediction or on a test set (or something like CV or bootstrap), that would be what I expect for most data. When you add more variables (that are, say, less informative) to a model, the model has to look harder to find the informative ones, and thus you pay a penalty. One exception to that is if some of the "new" variables happen to have very strong interaction with some of the "old" variables, then you may see improved performance.
I've said it several times before, but it seems to be worth repeating: Don't use the training set for evaluating models: that almost never make sense. Andy -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of matt Sent: Friday, May 11, 2012 3:43 PM To: r-help@r-project.org Subject: [R] Random forests prediction Hi all, I have a strange problem when applying RF in R. I have a set of variables with which I obtain an AUC of 0.67. I do have a second set of variables that have an AUC of 0.57. When I merge the first and second set of variables, the AUC becomes 0.64. I would expect the prediction to become better as I add variables that do have some predictive power? This is even more strange as the AUC on the training set increased when I added more variables (while the AUC of the validation set thus decreased). Is there anyone who has experienced the same and/or who know what could be the reason? Thanks, Matthijs -- View this message in context: http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.